GitHub - Toseic/LLM-inference-arxiv-daily: 🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)

Updated on 2026.03.09

inference
MoE

inference

Publish Date	Title	Authors	PDF	Code
2026-03-06	LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis	Tao Zhang et.al.	2603.05904	null
2026-03-05	Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks	Burak Topcu et.al.	2603.05692	null
2026-03-05	Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents	Natchanon Pollertlam et.al.	2603.04814	null
2026-03-05	SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM Inference	Luchang Li et.al.	2603.04716	null
2026-03-04	A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality	Arther Tian et.al.	2603.04028	null
2026-03-03	SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment	Priyavanshi Pathania et.al.	2603.02949	null
2026-03-03	Agentic Self-Evolutionary Replanning for Embodied Navigation	Guoliang Li et.al.	2603.02772	null
2026-03-03	Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference	Yiqi Liu et.al.	2603.02737	null
2026-03-02	Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads	Dominik Scheinert et.al.	2603.02057	null
2026-03-02	Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning	Jiebin Zhang et.al.	2603.01639	null
2026-03-02	Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)	Yu Lin et.al.	2603.01499	null
2026-03-02	Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study	Emmanuel Aboah Boateng et.al.	2603.01486	null
2026-03-02	SFCo-Nav: Efficient Zero-Shot Visual Language Navigation via Collaboration of Slow LLM and Fast Attributed Graph Alignment	Chaoran Xiong et.al.	2603.01477	null
2026-03-02	Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification	Guang Huang et.al.	2603.01399	null
2026-02-27	LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding	Alexander Samarin et.al.	2602.23881	null
2026-02-27	SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud	Hariz Yet et.al.	2602.23722	null
2026-02-26	Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems	Siyuan Liu et.al.	2602.23266	null
2026-02-26	Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching	Hiroki Matsutani et.al.	2602.22812	null
2026-02-25	Sustainable LLM Inference using Context-Aware Model Switching	Yuvarani et.al.	2602.22261	null
2026-02-25	Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text	Bitan Majumder et.al.	2602.21933	null
2026-02-26	DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference	Yongtong Wu et.al.	2602.21548	null
2026-02-24	SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks	Elizabeth S. Z. Tan et.al.	2602.21307	null
2026-02-24	ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments	Haley Li et.al.	2602.21140	null
2026-02-24	CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference	Chao Fei et.al.	2602.20732	null
2026-02-24	FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill	Rakshith Jayanth et.al.	2602.20515	null
2026-02-23	KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem	Seongjin Cha et.al.	2602.20217	null
2026-02-21	MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs	Dongwei Wang et.al.	2602.20191	null
2026-02-22	A Power Market Model with Hypersaclers and Modular Datacenters	Yihsu Chen et.al.	2602.19310	null
2026-02-22	Scaling Inference-Time Computation via Opponent Simulation: Enabling Online Strategic Adaptation in Repeated Negotiation	Xiangyu Liu et.al.	2602.19309	null
2026-02-21	WANSpec: Leveraging Global Compute Capacity for LLM Inference	Noah Martin et.al.	2602.18931	null
2026-02-21	BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS	Omar Basit et.al.	2602.18755	null
2026-02-21	HillInfer: Efficient Long-Context LLM Inference on the Edge with Hierarchical KV Eviction using SmartSSD	He Sun et.al.	2602.18750	null
2026-02-24	RPU -- A Reasoning Processing Unit	Matthew Adiletta et.al.	2602.18568	null
2026-02-20	Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering	Jiayi Wu et.al.	2602.18249	null
2026-02-19	Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs	Arka Pal et.al.	2602.17223	null
2026-02-18	Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks	Michael Cunningham et.al.	2602.16760	null
2026-02-18	LLM-Driven Intent-Based Privacy-Aware Orchestration Across the Cloud-Edge Continuum	Zijie Su et.al.	2602.16100	null
2026-02-17	CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill	Bradley McDanel et.al.	2602.16054	null
2026-02-17	MoE-Spec: Expert Budgeting for Efficient Speculative Decoding	Bradley McDanel et.al.	2602.16052	null
2026-02-17	Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation	Shutian Gu et.al.	2602.15724	null
2026-02-16	Efficient Multi-round LLM Inference over Disaggregated Serving	Wenhao He et.al.	2602.14516	null
2026-02-16	WiSparse: Boosting LLM Inference Efficiency with Weight-Aware Mixed Activation Sparsity	Lei Chen et.al.	2602.14452	null
2026-02-15	HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming	Jiahui Chen et.al.	2602.14214	null
2026-02-14	ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System	Hao Kang et.al.	2602.13692	null
2026-02-13	Characterize LSM-tree Compaction Performance via On-Device LLM Inference	Jiabiao Ding et.al.	2602.12669	null
2026-02-13	Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats	Pengxiang Zhao et.al.	2602.12635	null
2026-02-13	TensorCommitments: A Lightweight Verifiable Inference for Language Models	Oguzhan Baser et.al.	2602.12630	null
2026-02-12	Predicting LLM Output Length via Entropy-Guided Representations	Huanyi Xie et.al.	2602.11812	null
2026-02-12	Deep Kernel Fusion for Transformers	Zixi Zhang et.al.	2602.11808	null
2026-02-12	GORGO: Maximizing KV-Cache Reuse While Minimizing Network Latency in Cross-Region LLM Load Balancing	Alessio Ricci Toniolo et.al.	2602.11688	null
2026-02-12	Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt	Yujie Gu et.al.	2602.11513	null
2026-02-12	Cachemir: Fully Homomorphic Encrypted Inference of Generative Large Language Model with KV Cache	Ye Yu et.al.	2602.11470	null
2026-02-11	Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise	Abhishek Saini et.al.	2602.11088	null
2026-02-12	S-GRec: Personalized Semantic-Aware Generative Recommendation with Asymmetric Advantage	Jie Jiang et.al.	2602.10606	null
2026-02-10	Beyond SMILES: Evaluating Agentic Systems for Drug Discovery	Edward Wijaya et.al.	2602.10163	null
2026-02-12	Efficient Remote Prefix Fetching with GPU-native Media ASICs	Liang Mi et.al.	2602.09725	null
2026-02-10	MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering	Sieun Hyeon et.al.	2602.09642	null
2026-02-10	LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous Platforms	Jie Kong et.al.	2602.09323	null
2026-02-09	Benchmarking the Energy Savings with Speculative Decoding Strategies	Rohit Dutta et.al.	2602.09113	null
2026-02-09	FlattenGPT: Depth Compression for Transformer with Layer Flattening	Ruihan Xu et.al.	2602.08858	null
2026-02-09	Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference	Yifei Gao et.al.	2602.08329	null
2026-02-10	Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices	Alejandro Ruiz y Mesa et.al.	2602.08060	null
2026-02-08	Accuracy-Delay Trade-Off in LLM Offloading via Token-Level Uncertainty	Yumin Kim et.al.	2602.07958	null
2026-02-08	MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation	Yu Zhao et.al.	2602.07905	null
2026-02-08	Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model	Tianyi Wang et.al.	2602.07878	null
2026-02-07	ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs	Yanlin Qi et.al.	2602.07721	null
2026-02-07	Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference	Hoang Anh Duy Le et.al.	2602.07397	null
2026-02-06	SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding	Yikang Yue et.al.	2602.07223	null
2026-02-06	Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making	Khurram Yamin et.al.	2602.06286	null
2026-02-05	Towards Green AI: Decoding the Energy of LLM Inference in Software Development	Lola Solovyeva et.al.	2602.05712	null
2026-02-05	Determining Energy Efficiency Sweet Spots in Production LLM Inference	Hiari Pizzini Cavagna et.al.	2602.05695	null
2026-02-05	Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers	Jingkai Huang et.al.	2602.05395	null
2026-02-05	TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference	Jiyoung Park et.al.	2602.05145	null
2026-02-04	GPU-to-Grid: Voltage Regulation via GPU Utilization Control	Zhirui Liang et.al.	2602.05116	null
2026-02-04	Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference	Xinyu Wang et.al.	2602.04595	null
2026-02-04	LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding	Gang Lin et.al.	2602.04541	null
2026-02-04	BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models	Junyu Chen et.al.	2602.04163	null
2026-02-03	DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference	Jiancai Ye et.al.	2602.03184	null
2026-02-03	NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference	Jiangyong Yu et.al.	2602.02988	null
2026-02-03	Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal Control	Ruihan Lin et.al.	2602.02987	null
2026-02-02	Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing	Lingkun Long et.al.	2602.02159	null
2026-01-30	Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity	Aayush Gautam et.al.	2602.00397	null
2026-01-30	Harvest: Opportunistic Peer-to-Peer GPU Caching for LLM Inference	Nikhil Gopal et.al.	2602.00328	null
2026-01-30	EigenAI: Deterministic Inference, Verifiable Results	David Ribeiro Alves et.al.	2602.00182	null
2026-01-30	Safer Policy Compliance with Dynamic Epistemic Fallback	Joseph Marvin Imperial et.al.	2601.23094	null
2026-01-30	Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference	Yiding Feng et.al.	2601.22996	null
2026-01-30	Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding	Zhanglu Yan et.al.	2601.22876	null
2026-01-30	OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space	Zhiyuan Cao et.al.	2601.22752	null
2026-01-30	SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation	Ruiqi Zheng et.al.	2601.22543	null
2026-01-29	Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use	Julien Delavande et.al.	2601.22362	null
2026-01-29	EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference	Bronislav Sidik et.al.	2601.21758	null
2026-01-29	Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks	Arther Tian et.al.	2601.21189	null
2026-01-28	ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference	Ketan Thakkar et.al.	2601.21109	null
2026-01-29	ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler	Bohua Zou et.al.	2601.20755	null
2026-01-29	DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning	Yanlin Wang et.al.	2601.20615	null
2026-01-28	TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs	Minjae Lee et.al.	2601.20357	null
2026-01-28	Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning	Zeyu Xing et.al.	2601.20326	null
2026-01-28	SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips	Jiahuan Yu et.al.	2601.20309	null
2026-01-28	LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis	Marcus Emmanuel Barnes et.al.	2601.20148	null
2026-01-27	Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering	Fangan Dong et.al.	2601.19847	null
2026-01-27	DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference	Fuliang Liu et.al.	2601.19278	null
2026-01-26	Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective	Fangzhou Wu et.al.	2601.18999	null
2026-01-26	Flatter Tokens are More Valuable for Speculative Draft Model Training	Jiaming Fan et.al.	2601.18902	null
2026-01-26	Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B	Jaiyoung Park et.al.	2601.18511	null
2026-01-26	FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning	Lin Sun et.al.	2601.18116	null
2026-01-25	LLM-42: Enabling Determinism in LLM Inference with Verified Speculation	Raja Gond et.al.	2601.17768	null
2026-01-25	Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction	Jang-Hyun Kim et.al.	2601.17668	null
2026-01-24	GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference	Thomas Ziller et.al.	2601.17551	null
2026-01-22	FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design	Jiahao Zhang et.al.	2601.15710	null
2026-01-21	MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification	Jingwei Song et.al.	2601.15498	null
2026-01-21	QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design	Nilesh Prasad Pandey et.al.	2601.14549	null
2026-01-20	HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference	Zhiyuan Shi et.al.	2601.13684	null
2026-01-20	PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator	Yue Jiet Chong et.al.	2601.13628	null
2026-01-19	Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models	Héctor Manuel Manzanilla-Granados et.al.	2601.13443	null
2026-01-19	Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference	Zimeng Wu et.al.	2601.13155	null
2026-01-19	From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation	Jiahao Wang et.al.	2601.12904	null
2026-01-18	Power Aware Dynamic Reallocation For Inference	Yiwei Jiang et.al.	2601.12241	null
2026-01-16	RAPID-Serve: Resource-efficient and Accelerated P/D Intra-GPU Disaggregation	Amna Masood et.al.	2601.11822	null
2026-01-16	HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network	Peirong Zheng et.al.	2601.11676	null
2026-01-15	WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching	Xiangchen Li et.al.	2601.11652	null
2026-01-16	FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning	Zhihan Yang et.al.	2601.11311	null
2026-01-14	Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs	Jonathan Knoop et.al.	2601.09527	null
2026-01-14	LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference	Du Yin et.al.	2601.09258	null
2026-01-13	HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding	Qitan Lv et.al.	2601.08273	null
2026-01-13	Coordinated Cooling and Compute Management for AI Datacenters	Nardos Belay Abera et.al.	2601.08113	null
2026-01-12	Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference	Rei Taniguchi et.al.	2601.07667	null
2026-01-12	ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs	Haoqian Meng et.al.	2601.07475	null
2026-01-12	TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees	Tianyu Liu et.al.	2601.07353	null
2026-01-12	Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition	Tanmay Joshi et.al.	2601.07239	null
2026-01-09	AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving	Tianhao Xu et.al.	2601.06288	null
2026-01-07	AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization	Zhiqiang Wang et.al.	2601.06177	null
2026-01-14	Challenges and Research Directions for Large Language Model Inference Hardware	Xiaoyu Ma et.al.	2601.05047	null
2026-01-08	Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence	Shengyin Sun et.al.	2601.04766	null
2026-01-08	GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models	Maanas Taneja et.al.	2601.04719	null
2026-01-07	XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs	Linzhang Li et.al.	2601.04426	null
2026-01-05	LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference	Hossein Rajabzadeh et.al.	2601.02569	null
2026-01-06	Making MoE-based LLM Inference Resilient with Tarragon	Songyu Zhang et.al.	2601.01310	null
2026-01-08	From Policy to Logic for Efficient and Interpretable Coverage Assessment	Rhitabrat Pokharel et.al.	2601.01266	null
2026-01-01	FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems	Shanli Xing et.al.	2601.00227	null
2025-12-31	FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference	Fen-Yu Hsieh et.al.	2512.24713	null
2026-01-04	Hardware Acceleration for Neural Networks: A Comprehensive Survey	Bin Xu et.al.	2512.23914	null
2025-12-29	Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding	Yue Guan et.al.	2512.23858	null
2025-12-28	Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware	Alex Khalil et.al.	2512.23029	null
2025-12-28	Argus: Token Aware Distributed LLM Inference Optimization	Panlong Wu et.al.	2512.22925	null
2025-12-27	Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving	Rui Li et.al.	2512.22420	null
2025-12-22	Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs	Xinhao Cheng et.al.	2512.22219	null
2025-12-20	MatKV: Trading Compute for Flash Storage in LLM Inference	Kun-Woo Shin et.al.	2512.22195	null
2025-12-26	Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling	Hannah Atmer et.al.	2512.22066	null
2025-12-26	Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models	Tingyang Sun et.al.	2512.21884	null
2025-12-26	LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices	Mingyu Sun et.al.	2512.21835	null
2025-12-23	Predictive-LoRA: A Proactive and Fragmentation-Aware Serverless Inference System for LLMs	Yinan Ni et.al.	2512.20210	null
2025-12-23	Concept Generalization in Humans and Large Language Models: Insights from the Number Game	Arghavan Bazigaran et.al.	2512.20162	null
2025-12-20	TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale	Dongha Yoon et.al.	2512.18194	null
2025-12-20	Making Strong Error-Correcting Codes Work Effectively for HBM in AI Inference	Rui Xie et.al.	2512.18152	null
2025-12-19	Specification and Detection of LLM Code Smells	Brahim Mahmoudi et.al.	2512.18020	null
2025-12-19	CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs	Gunho Park et.al.	2512.17970	null
2025-12-19	Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing	Lingxiao Zhao et.al.	2512.17574	null
2025-12-22	Learning What to Write: Write-Gated KV for Efficient Long-Context Inference	Yen-Chieh Huang et.al.	2512.17452	null
2025-12-18	Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference	Dhruv Deshmukh et.al.	2512.16391	null
2025-12-18	Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference	Arther Tian et.al.	2512.16317	null
2025-12-18	Fast Collaborative Inference via Distributed Speculative Decoding	Ce Zheng et.al.	2512.16273	null
2025-12-18	Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference	Jian Tian et.al.	2512.16134	null
2025-12-16	EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving	Shaoting Feng et.al.	2512.14946	null
2025-12-16	Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement	Songze Liu et.al.	2512.14151	null
2025-12-14	Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM	Furong Jia et.al.	2512.12868	null
2025-12-14	Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P	Anurag Dutt et.al.	2512.12801	null
2025-12-13	V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval	Donghyuk Kim et.al.	2512.12284	null
2025-12-12	Learning to Extract Context for Context-Aware LLM Inference	Minseon Kim et.al.	2512.11986	null
2025-12-12	PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration	Yifan Zhang et.al.	2512.11550	null
2025-12-12	AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference	Kuan-Wei Lu et.al.	2512.11280	null
2025-12-12	Adaptive Soft Rolling KV Freeze with Entropy-Guided Recovery: Sublinear Memory Growth for Efficient LLM Inference	Adilet Metinov et.al.	2512.11221	null
2025-12-11	LLM-Auction: Generative Auction towards LLM-Native Advertising	Chujie Zhao et.al.	2512.10551	null
2025-12-14	GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference	Phuong Tran et.al.	2512.09963	null
2025-12-10	RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference	Siyuan Ma et.al.	2512.09304	null
2025-12-09	Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging	Yi Pan et.al.	2512.08365	null
2025-12-08	NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models	Feng Liang et.al.	2512.07218	null
2025-12-08	Leveraging KV Similarity for Online Structured Pruning in LLMs	Jungmin Lee et.al.	2512.07090	null
2025-12-07	PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance	Jifar Wakuma Ayana et.al.	2512.06747	null
2025-12-07	KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models	Sourjya Roy et.al.	2512.06727	null
2025-12-06	Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices	Xiangyu Li et.al.	2512.06443	null
2025-12-05	Compass: Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads	Boyu Li et.al.	2512.06093	null
2025-12-05	KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity	Damien Lesens et.al.	2512.05916	null
2025-12-05	RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs	Jonathan Geuter et.al.	2512.05542	null
2025-12-05	Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches	Namu Park et.al.	2512.05537	null
2025-12-05	Knowing Your Uncertainty -- On the application of LLM in social sciences	Bolun Zhang et.al.	2512.05461	null
2025-12-04	Towards A Cultural Intelligence and Values Inferences Quality Benchmark for Community Values and Common Knowledge	Brittany Johnson et.al.	2512.05176	null
2025-12-04	Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning	Purbesh Mitra et.al.	2512.05105	null
2025-12-04	David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?	Shashwat Shankar et.al.	2512.05073	null
2025-12-04	MemLoRA: Distilling Expert Adapters for On-Device Memory Systems	Massimo Bini et.al.	2512.04763	null
2025-12-04	EtCon: Edit-then-Consolidate for Reliable Knowledge Editing	Ruilin Li et.al.	2512.04753	null
2025-12-04	RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting	Siqi Wang et.al.	2512.04752	null
2025-12-04	Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild	Yigui Feng et.al.	2512.04728	null
2025-12-04	PBFuzz: Agentic Directed Fuzzing for PoV Generation	Haochen Zeng et.al.	2512.04611	null
2025-12-04	A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution	Huifeng Zhu et.al.	2512.04580	null
2025-12-04	On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference	Yue Yu et.al.	2512.04558	null
2025-12-04	MSME: A Multi-Stage Multi-Expert Framework for Zero-Shot Stance Detection	Yuanshuo Zhang et.al.	2512.04492	null
2025-12-04	LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models	Jiaqi Sun et.al.	2512.04474	null
2025-12-03	AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving	Ying Wang et.al.	2512.04013	null
2025-12-03	OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference	Liujianfu Wang et.al.	2512.03927	null
2025-12-03	Training and Evaluation of Guideline-Based Medical Reasoning in LLMs	Michael Staniek et.al.	2512.03838	null
2025-12-03	ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers	Feice Huang et.al.	2512.03673	null
2025-12-03	KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing	Lishuo Deng et.al.	2512.03608	null
2025-12-03	EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths	Zhening Li et.al.	2512.03571	null
2025-12-03	A Preliminary Study on the Promises and Challenges of Native Top- $k$ Sparse Attention	Di Xiu et.al.	2512.03494	null
2025-12-03	From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation	Qingchuan Li et.al.	2512.03360	null
2025-12-03	Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs	Ngoc Bui et.al.	2512.03324	null
2025-12-02	LLM-Guided Material Inference for 3D Point Clouds	Nafiseh Izadyar et.al.	2512.03237	null
2025-12-02	TokenPowerBench: Benchmarking the Power Consumption of LLM Inference	Chenxu Niu et.al.	2512.03024	null
2025-12-02	Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge	Hamid Dadkhahi et.al.	2512.03019	null
2025-12-02	FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization	Feiyu Wang et.al.	2512.02901	null
2025-12-02	OptPO: Optimal Rollout Allocation for Test-time Policy Optimization	Youkang Wang et.al.	2512.02882	null
2025-12-02	Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages	Lechen Zhang et.al.	2512.02841	null
2025-12-02	FiMMIA: scaling semantic perturbation-based membership inference across modalities	Anton Emelyanov et.al.	2512.02786	null
2025-12-02	Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs	Julian Ma et.al.	2512.02719	null
2025-12-02	CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning	Songqiao Su et.al.	2512.02551	null
2025-12-02	In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs	Vishnu Sarukkai et.al.	2512.02543	null
2025-12-02	Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective	Qiyao Xue et.al.	2512.02340	null
2025-12-01	Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling	Jack Cook et.al.	2512.02010	null
2025-12-01	The Art of Scaling Test-Time Compute for Large Language Models	Aradhye Agarwal et.al.	2512.02008	null
2025-12-01	KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference	Sai Gokhale et.al.	2512.01953	null
2025-12-01	Latent Debate: A Surrogate Framework for Interpreting LLM Thinking	Lihu Chen et.al.	2512.01909	null
2025-12-01	DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models	Patrick Kwon et.al.	2512.01686	null
2025-12-01	A Systematic Characterization of LLM Inference on GPUs	Haonan Wang et.al.	2512.01644	null
2025-12-01	LLM2Fx-Tools: Tool Calling For Music Post-Production	Seungheon Doh et.al.	2512.01559	null
2025-12-01	Multi-Path Collaborative Reasoning via Reinforcement Learning	Jindi Lv et.al.	2512.01485	null
2025-12-01	ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation	Rohin Manvi et.al.	2512.01457	null
2025-12-01	Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning	Jiahao Yuan et.al.	2512.01282	null
2025-11-30	Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios	Jianxiang Zang et.al.	2512.00920	null
2025-11-30	AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent	Neeraj Anand et.al.	2512.00846	null
2025-11-30	ARCADIA: Scalable Causal Discovery for Corporate Bankruptcy Analysis Using Agentic AI	Fabrizio Maturo et.al.	2512.00839	null
2025-11-30	SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving	Bohan Zhao et.al.	2512.00719	null
2025-11-29	SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling	Yang Xiao et.al.	2512.00466	null
2025-11-29	Echo-N1: Affective RL Frontier	Naifan Zhang et.al.	2512.00344	null
2025-11-29	Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA	Takuto Ando et.al.	2512.00335	null
2025-11-29	RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs	Ruike Hu et.al.	2512.00319	null
2025-11-29	Evolving Paradigms in Task-Based Search and Learning: A Comparative Analysis of Traditional Search Engine with LLM-Enhanced Conversational Search System	Zhitong Guan et.al.	2512.00313	null
2025-11-28	Demystifying Errors in LLM Reasoning Traces: An Empirical Study of Code Execution Simulation	Mohammad Abdollahi et.al.	2512.00215	null
2025-11-28	ThetaEvolve: Test-time Learning on Open Problems	Yiping Wang et.al.	2511.23473	null
2025-11-28	Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs	Jiancheng Dong et.al.	2511.23271	null
2025-11-28	Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering	Qiming Li et.al.	2511.23231	null
2025-11-28	HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding	Chen Li et.al.	2511.23178	null
2025-11-28	Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match	Jinze Li et.al.	2511.22972	null
2025-11-28	Experts are all you need: A Composable Framework for Large Language Model Inference	Shrihari Sridharan et.al.	2511.22955	null
2025-11-28	Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework	Kelaiti Xiao et.al.	2511.22943	null
2025-11-28	RAG-Empowered LLM-Driven Dynamic Radio Resource Management in Open 6G RAN	Onur Salan et.al.	2511.22933	null
2025-11-28	Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems	Shashwat Jaiswal et.al.	2511.22880	null
2025-11-27	PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration	Junfei Zhan et.al.	2511.22788	null
2025-11-27	CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights	Mohaiminul Al Nahian et.al.	2511.22681	null
2025-11-27	GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents	Xinyu Zhang et.al.	2511.22441	null
2025-11-27	FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators	Shuao Jia et.al.	2511.22348	null
2025-11-27	Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends	Pablo Prieto et.al.	2511.22334	null
2025-11-27	RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems	Mengfan Li et.al.	2511.22275	null
2025-11-27	Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR	Yuyang Zou et.al.	2511.22267	null
2025-11-27	Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information	Lukas Struppek et.al.	2511.22176	null
2025-11-27	Statistical Independence Aware Caching for LLM Workflows	Yihan Dai et.al.	2511.22118	null
2025-11-26	A Comparative Study of LLM Prompting and Fine-Tuning for Cross-genre Authorship Attribution on Chinese Lyrics	Yuxin Li et.al.	2511.21930	null
2025-11-26	Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework	Dong Wang et.al.	2511.21686	null
2025-11-26	DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving	Fengze Yu et.al.	2511.21669	null
2025-11-26	Auxiliary Metrics Help Decoding Skill Neurons in the Wild	Yixiu Zhao et.al.	2511.21610	null
2025-11-26	Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM	Tim Trappen et.al.	2511.21413	null
2025-11-26	PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark	Robert Belanec et.al.	2511.21285	null
2025-11-26	BRIDGE: Building Representations In Domain Guided Program Verification	Robert Joseph George et.al.	2511.21104	null
2025-11-26	MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts	Ivan Novikov et.al.	2511.21089	null
2025-11-26	OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection	Chujie Wang et.al.	2511.21064	null
2025-11-26	LOOM: Personalized Learning Informed by Daily LLM Conversations Toward Long-Term Mastery via a Dynamic Learner Memory Graph	Justin Cui et.al.	2511.21037	null
2025-11-26	CaptionQA: Is Your Caption as Useful as the Image Itself?	Shijia Yang et.al.	2511.21025	null
2025-11-26	A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving	Junhan Liao et.al.	2511.20982	null
2025-11-26	Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows	Yinwei Dai et.al.	2511.20975	null
2025-11-25	Representation Interventions Enable Lifelong Unstructured Knowledge Control	Xuyuan Liu et.al.	2511.20892	null
2025-11-25	Latent Collaboration in Multi-Agent Systems	Jiaru Zou et.al.	2511.20639	null
2025-11-25	DiFR: Inference Verification Despite Nondeterminism	Adam Karvonen et.al.	2511.20621	null
2025-11-25	Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models	Shamima Hossain et.al.	2511.20531	null
2025-11-25	Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios	Luohe Shi et.al.	2511.20340	null
2025-11-25	LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design	Lianzhe Hu et.al.	2511.20276	null
2025-11-25	REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance	Chuyi Kong et.al.	2511.20233	null
2025-11-25	Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management	Xinjun Yang et.al.	2511.20172	null
2025-11-25	SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space	Zhenyi Shen et.al.	2511.20102	null
2025-11-25	More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering	Duc Anh Vu et.al.	2511.20086	null
2025-11-25	Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design	Zixiao Huang et.al.	2511.20048	null
2025-11-25	CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model	Dapeng Zhang et.al.	2511.19914	null
2025-11-25	Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models	Wentao Hu et.al.	2511.19822	null
2025-11-24	Gender Bias in Emotion Recognition by Large Language Models	Maureen Herbert et.al.	2511.19785	null
2025-11-24	Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces	Shaltiel Shmidman et.al.	2511.19333	null
2025-11-24	MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization	Boyuan Wu et.al.	2511.19253	null
2025-11-24	Learning Plug-and-play Memory for Guiding Video Diffusion Models	Selena Song et.al.	2511.19229	null
2025-11-24	From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation	Moazzam Umer Gondal et.al.	2511.19149	null
2025-11-24	SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression	Santhosh G S et.al.	2511.18936	null
2025-11-24	Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations	Ryan Wong et.al.	2511.18933	null
2025-11-24	KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit	Dezhi Ran et.al.	2511.18868	null
2025-11-24	Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models	Yang Xiang et.al.	2511.18864	null
2025-11-24	UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model	Changxin Huang et.al.	2511.18845	null
2025-11-24	Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds	Mohammad Nour Al Awad et.al.	2511.18842	null
2025-11-23	A Needle in a Haystack: Intent-driven Reusable Artifacts Recommendation with LLMs	Dongming Jin et.al.	2511.18343	null
2025-11-23	Skypilot: Fine-Tuning LLM with Physical Grounding for AAV Coverage Search	Zhongkai Chen et.al.	2511.18270	null
2025-11-23	LLM Reasoning for Cold-Start Item Recommendation	Shijun Li et.al.	2511.18261	null
2025-11-22	Towards Harnessing the Power of LLMs for ABAC Policy Mining	More Aayush Babasaheb et.al.	2511.18098	null
2025-11-22	L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention	Yuliang Zhan et.al.	2511.17910	null
2025-11-22	QuickLAP: Quick Language-Action Preference Learning for Autonomous Driving Agents	Jordan Abi Nader et.al.	2511.17855	null
2025-11-21	Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch	Ziyang Zhang et.al.	2511.17826	null
2025-11-21	APRIL: Annotations for Policy evaluation with Reliable Inference from LLMs	Aishwarya Mandyam et.al.	2511.17818	null
2025-11-21	That's not natural: The Impact of Off-Policy Training Data on Probe Performance	Nathalie Kirch et.al.	2511.17408	null
2025-11-21	SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion	Jiajie Guo et.al.	2511.17308	null
2025-11-21	Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models	Vy Nguyen et.al.	2511.17170	null
2025-11-21	ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better	Yuan Zhang et.al.	2511.17106	null
2025-11-21	Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters	Zhan Su et.al.	2511.17044	null
2025-11-21	Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems	Kirill Nagaitsev et.al.	2511.16964	null
2025-11-20	Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems	Elias Lumer et.al.	2511.16654	null
2025-11-20	Integrating Symbolic Natural Language Understanding and Language Models for Word Sense Disambiguation	Kexin Zhao et.al.	2511.16577	null
2025-11-20	The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation	Jiaheng Zhang et.al.	2511.16543	null
2025-11-20	Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks	Éloïse Benito-Rodriguez et.al.	2511.16540	null
2025-11-20	Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement	Jiashu Yao et.al.	2511.16331	null
2025-11-20	SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning	Wei Xia et.al.	2511.16324	null
2025-11-20	T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs	Shao-Jun Xia et.al.	2511.16107	null
2025-11-20	Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio	Mohan Shi et.al.	2511.16046	null
2025-11-20	A Scalable NorthPole System with End-to-End Vertical Integration for Low-Latency and Energy-Efficient LLM Inference	Michael V. DeBole et.al.	2511.15950	null
2025-11-19	Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization	Rahul Krishna Thomas et.al.	2511.15898	null
2025-11-19	MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping	Yushi Huang et.al.	2511.15690	null
2025-11-19	A Tensor Compiler for Processing-In-Memory Architectures	Peiming Yang et.al.	2511.15503	null
2025-11-19	Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining	Qian'ang Mao et.al.	2511.15456	null
2025-11-19	Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization	Suyu Chen et.al.	2511.15389	null
2025-11-19	HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning	Alexis Correa-Guillén et.al.	2511.15355	null
2025-11-19	OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition	Xinli Tao et.al.	2511.15211	null
2025-11-19	As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files	Haodong Li et.al.	2511.15192	null
2025-11-19	Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference	Kexin Chu et.al.	2511.15015	null
2025-11-18	Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models	Rui Zhu et.al.	2511.14694	null
2025-11-18	Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer	Kallol Mondal et.al.	2511.14691	null
2025-11-18	Bias in, Bias out: Annotation Bias in Multilingual Large Language Models	Xia Cui et.al.	2511.14662	null
2025-11-18	AutoTool: Efficient Tool Selection for Large Language Model Agents	Jingyi Jia et.al.	2511.14650	null
2025-11-18	A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder	Dengyun Huang et.al.	2511.14600	null
2025-11-18	Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language	Minyoung Hwang et.al.	2511.14565	null
2025-11-18	CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design	Jiawei Yi et.al.	2511.14510	null
2025-11-18	Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks	Mulei Ma et.al.	2511.14450	null
2025-11-18	PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models	Yu Liu et.al.	2511.14256	null
2025-11-18	Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation	Yu Zhong et.al.	2511.14131	null
2025-11-18	PRISM: Prompt-Refined In-Context System Modelling for Financial Retrieval	Chun Chet Ng et.al.	2511.14130	null
2025-11-18	Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services	Liuyi Jin et.al.	2511.14119	null
2025-11-18	FailSafe: High-performance Resilient Serving	Ziyi Xu et.al.	2511.14116	null
2025-11-17	TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone	Xunjie Wang et.al.	2511.13717	null
2025-11-17	T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization	Hyunwoo Oh et.al.	2511.13676	null
2025-11-17	Tight and Practical Privacy Auditing for Differentially Private In-Context Learning	Yuyang Xia et.al.	2511.13502	null
2025-11-17	Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment	Jea Kwon et.al.	2511.13290	null
2025-11-17	Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms	Patrick Parschan et.al.	2511.13238	null
2025-11-17	TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs	Yuxiang Zhang et.al.	2511.13223	null
2025-11-17	TCM-5CEval: Extended Deep Evaluation Benchmark for LLM's Comprehensive Clinical Research Competence in Traditional Chinese Medicine	Tianai Huang et.al.	2511.13169	null
2025-11-17	MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity	Vladimír Macko et.al.	2511.13061	null
2025-11-17	RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems	Zhengchao Wang et.al.	2511.12979	null
2025-11-17	MedRule-KG: A Knowledge-Graph--Steered Scaffold for Reliable Mathematical and Biomedical Reasoning	Crystal Su et.al.	2511.12963	null
2025-11-16	ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction	Pengze Li et.al.	2511.12485	null
2025-11-16	Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models	Chenglong Wang et.al.	2511.12464	null
2025-11-15	Optimal Self-Consistency for Efficient Reasoning with Large Language Models	Austin Feng et.al.	2511.12309	null
2025-11-15	Sangam: Chiplet-Based DRAM-PIM Accelerator with CXL Integration for LLM Inferencing	Khyati Kiyawat et.al.	2511.12286	null
2025-11-15	MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues	Liang Xue et.al.	2511.12213	null
2025-11-15	AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing	Qingyu Zhang et.al.	2511.12133	null
2025-11-15	OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description	Quanxing Xu et.al.	2511.12131	null
2025-11-15	BudgetLeak: Membership Inference Attacks on RAG Systems via the Generation Budget Side Channel	Hao Li et.al.	2511.12043	null
2025-11-15	Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding	Arun Ramachandran et.al.	2511.12031	null
2025-11-14	Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models	Siyou Li et.al.	2511.11910	null
2025-11-14	Experience-Guided Adaptation of Inference-Time Reasoning Strategies	Adam Stein et.al.	2511.11519	null
2025-11-14	W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search	Zhenyu Ding et.al.	2511.11518	null
2025-11-14	MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism	Shulin Liu et.al.	2511.11373	null
2025-11-14	iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference	Wei Fan et.al.	2511.11306	null
2025-11-14	T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup	Jianyu Wei et.al.	2511.11248	null
2025-11-14	STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models	Huajian Zhang et.al.	2511.11233	null
2025-11-14	AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization	Zhonghua Jiang et.al.	2511.11106	null
2025-11-14	GraphMASAL: A Graph-based Multi-Agent System for Adaptive Learning	Biqing Zeng et.al.	2511.11035	null
2025-11-14	DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition	HongYu Liu et.al.	2511.11000	null
2025-11-14	DEFT-LLM: Disentangled Expert Feature Tuning for Micro-Expression Recognition	Ren Zhang et.al.	2511.10948	null
2025-11-13	ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference	Yesheng Liang et.al.	2511.10645	null
2025-11-13	Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs	Changhai Man et.al.	2511.10480	null
2025-11-13	FactGuard: Event-Centric and Commonsense-Guided Fake News Detection	Jing He et.al.	2511.10281	null
2025-11-13	Efficient Thought Space Exploration through Strategic Intervention	Ziheng Li et.al.	2511.10038	null
2025-11-13	EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models	Jialin Wu et.al.	2511.09880	null
2025-11-13	HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning	Nikunj Gupta et.al.	2511.09873	null
2025-11-12	From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance	Jeongho Min et.al.	2511.09820	null
2025-11-13	LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations with Fast All-Reduce Communication	Prajwal Singhania et.al.	2511.09557	null
2025-11-12	Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling	Shiyu Ji et.al.	2511.09345	null
2025-11-12	Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference	Tong Wu et.al.	2511.09323	null
2025-11-10	Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models	Tianrui Song et.al.	2511.07295	null
2025-11-10	P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats	Yuzong Chen et.al.	2511.06838	null
2025-11-09	Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism	Cong Li et.al.	2511.06247	null
2025-11-09	LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs	Zifan He et.al.	2511.06174	null
2025-11-08	MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference	Myunghyun Rhee et.al.	2511.06010	null
2025-11-08	MCP-RiskCue: Can LLM infer risk information from MCP server System Logs?	Jiayi Fu et.al.	2511.05867	null
2025-11-06	Enabling Dynamic Sparsity in Quantized LLM Inference	Rongxiang Wang et.al.	2511.04477	null
2025-11-06	E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce	Ge Zhang et.al.	2511.04087	null
2025-11-06	PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration	Yue Jiet Chong et.al.	2511.04036	null
2025-11-06	LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis	Shiyin Lin et.al.	2511.04023	null
2025-11-05	RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse	Yinsicheng Jiang et.al.	2511.03475	null
2025-11-07	UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM	Hai Huang et.al.	2511.03293	null
2025-11-04	Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes	Mohammadsajad Alipour et.al.	2511.02681	null
2025-11-04	Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks	Xiumei Deng et.al.	2511.02647	null
2025-11-04	Verifying LLM Inference to Prevent Model Weight Exfiltration	Roy Rinberg et.al.	2511.02620	null
2025-11-03	KV Cache Transform Coding for Compact Storage in LLM Inference	Konrad Staniszewski et.al.	2511.01815	null
2025-11-04	Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding	Jungyeon Koh et.al.	2511.01695	null
2025-11-03	Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving	Chengying Huan et.al.	2511.01633	null
2025-11-03	When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding	Min Fang et.al.	2511.01282	null
2025-11-04	CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing	Yifan Zhou et.al.	2511.01197	null
2025-11-04	SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding	Jameson Sandler et.al.	2511.00606	null
2025-11-01	FlashEVA: Accelerating LLM inference via Efficient Attention	Juan Gabriel Kostelec et.al.	2511.00576	null
2025-10-31	AMD MI300X GPU Performance Analysis	Chandrish Ambati et.al.	2510.27583	null
2025-10-31	Glia: A Human-Inspired AI for Automated Systems Design and Optimization	Pouya Hamadanian et.al.	2510.27176	null
2025-10-30	Beyond Benchmarks: The Economics of AI Inference	Boqin Zhuang et.al.	2510.26136	null
2025-10-31	AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache	Dinghong Song et.al.	2510.25979	null
2025-10-31	NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium	Dinghong Song et.al.	2510.25977	null
2025-10-29	Serve Programs, Not Prompts	In Gim et.al.	2510.25412	null
2025-10-26	Batch Speculative Decoding Done Right	Ranran Haoran Zhang et.al.	2510.22876	null
2025-10-26	Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration	Yuval Kainan et.al.	2510.22679	null
2025-10-26	SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size	Jinhan Chen et.al.	2510.22556	null
2025-10-22	Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs	Hongyi Liu et.al.	2510.20064	null
2025-10-22	Are Large Language Models Sensitive to the Motives Behind Communication?	Addison J. Wu et.al.	2510.19687	null
2025-10-30	DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference	Xiang Liu et.al.	2510.19669	null
2025-10-21	SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices	Pan Zhou et.al.	2510.18544	null
2025-10-19	Justitia: Fair and Efficient Scheduling for LLM Applications	Mingyan Yang et.al.	2510.17015	null
2025-10-18	FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference	Jian Ma et.al.	2510.16418	null
2025-10-16	AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization	Mengtao Lv et.al.	2510.16045	null
2025-10-16	Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing	Tianhua Xia et.al.	2510.16040	null
2025-10-28	TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs	Sibo Xiao et.al.	2510.15545	null
2025-10-16	Tail-Optimized Caching for LLM Inference	Wenxin Zhang et.al.	2510.15152	null
2025-10-16	xLLM Technical Report	Tongxuan Liu et.al.	2510.14686	null
2025-10-16	MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving	Jungi Lee et.al.	2510.14557	null
2025-10-16	FairBatching: Fairness-Aware Batch Formation for LLM Inference	Hongtao Lyu et.al.	2510.14392	null
2025-10-16	Qwen3Guard Technical Report	Haiquan Zhao et.al.	2510.14276	null
2025-10-15	Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management	Thanh Son Phung et.al.	2510.14024	null
2025-10-15	Adaptive Rescheduling in Prefill-Decode Disaggregated LLM Inference	Zhibin Wang et.al.	2510.13668	null
2025-10-15	F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs	Jude Haris et.al.	2510.13401	null
2025-10-15	Taming the Fragility of KV Cache Eviction in LLM Inference	Yuan Feng et.al.	2510.13334	null
2025-10-15	Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference	Nikhil Bhendawade et.al.	2510.13161	null
2025-10-14	Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification?	Cedric Richter et.al.	2510.12702	null
2025-10-14	Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models	Donghwan Rho et.al.	2510.12343	null
2025-10-13	Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding	Bingjie Zhu et.al.	2510.11331	null
2025-10-13	Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs	João Paulo Cardoso de Lima et.al.	2510.11192	null
2025-10-11	CacheClip: Accelerating RAG with Effective KV Cache Reuse	Bin Yang et.al.	2510.10129	null
2025-10-10	FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference	Yu-Chen Lu et.al.	2510.09332	null
2025-10-10	Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion	Ruitong Liu et.al.	2510.08966	null
2025-10-13	Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors	Xin Liu et.al.	2510.08907	null
2025-10-09	SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference	Hengrui Zhang et.al.	2510.08544	null
2025-10-09	From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill	Gunjun Lee et.al.	2510.08055	null
2025-10-09	Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models	Zhiqing Cui et.al.	2510.07858	null
2025-10-09	OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference	Yuzhe Gu et.al.	2510.07651	null
2025-10-08	Accelerating Diffusion LLM Inference via Local Determinism Propagation	Fanheng Kong et.al.	2510.07081	null
2025-10-08	Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon	Baraq Lipshitz et.al.	2510.06957	null
2025-10-07	VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization	Dingyu Yao et.al.	2510.06175	null
2025-10-07	lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models	Haoxin Wang et.al.	2510.06126	null
2025-10-07	From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs	Tianhao Zhu et.al.	2510.05632	null
2025-10-06	KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction	Utkarsh Saxena et.al.	2510.05373	null
2025-10-06	A novel hallucination classification framework	Maksym Zavhorodnii et.al.	2510.05189	null
2025-10-06	RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms	Samah Kansab et.al.	2510.04796	null
2025-10-05	Speculative Actions: A Lossless Framework for Faster Agentic Systems	Naimeng Ye et.al.	2510.04371	null
2025-10-03	Best-of-Majority: Minimax-Optimal Strategy for Pass@ $k$ Inference Scaling	Qiwei Di et.al.	2510.03199	null
2025-10-03	Dissecting Transformers: A CLEAR Perspective towards Green AI	Hemang Jain et.al.	2510.02810	null
2025-10-03	HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference	Shubham Negi et.al.	2510.02675	null
2025-10-01	PolyLink: A Blockchain Based Decentralized Edge AI Platform for LLM Inference	Hongbo Liu et.al.	2510.02395	null
2025-10-03	Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey	Qiyuan Liu et.al.	2510.01925	null
2025-10-02	SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning	Shicheng Liu et.al.	2510.01832	null
2025-10-01	HiSpec: Hierarchical Speculative Decoding for LLMs	Avinash Kumar et.al.	2510.01336	null
2025-10-01	Generalized Parallel Scaling with Interdependent Generations	Harry Dong et.al.	2510.01143	null
2025-10-01	AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size	Guanxi Lu et.al.	2509.26432	null
2025-09-30	Parallax: Efficient LLM Inference Service over Decentralized Environment	Chris Tong et.al.	2509.26182	null
2025-09-30	Accelerating LLM Inference with Precomputed Query Storage	Jay H. Park et.al.	2509.25919	null
2025-09-30	SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV	Jingyao Zhang et.al.	2509.25853	null
2025-09-29	SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching	Xinye Zhao et.al.	2509.24832	null
2025-09-29	Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding	Sungkyun Kim et.al.	2509.24328	null
2025-09-29	VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference	Ke Wang et.al.	2509.24257	null
2025-09-28	Collaborative Device-Cloud LLM Inference through Reinforcement Learning	Wenzhi Fang et.al.	2509.24050	null
2025-10-01	A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving	Yue Zhang et.al.	2509.23384	null
2025-09-27	Scaling LLM Test-Time Compute with Mobile NPU on Smartphones	Zixu Hao et.al.	2509.23324	null
2025-09-27	Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization	Vage Egiazarian et.al.	2509.23202	null
2025-09-26	Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs	Shirin Alanova et.al.	2509.22166	null
2025-09-26	Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding	Shijing Hu et.al.	2509.22134	null
2025-09-26	SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation	Haotian Tan et.al.	2509.21932	null
2025-09-25	Preemptive Detection and Steering of LLM Misalignment via Latent Reachability	Sathwik Karnik et.al.	2509.21528	null
2025-09-25	Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks	Murat Arda Onsu et.al.	2509.21259	null
2025-09-24	FastEagle: Cascaded Drafting for Accelerating Speculative Decoding	Haiduo Huang et.al.	2509.20416	null
2025-09-24	Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment	Deokjae Lee et.al.	2509.20214	null
2025-09-24	Gyges: Dynamic Cross-Instance Parallelism Transformation for Efficient LLM Inference	Haoyu Chen et.al.	2509.19729	null
2025-09-23	Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs	Marcin Chrapek et.al.	2509.18886	null
2025-09-22	Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models	Dingxin Lu et.al.	2509.18221	null
2025-09-28	Disaggregated Prefill and Decoding Inference System for Large Language Model Serving on Multi-Vendor GPUs	Xing Chen et.al.	2509.17542	null
2025-09-22	Cronus: Efficient LLM inference on Heterogeneous GPU Clusters via Partially Disaggregated Prefill	Yunzhao Liu et.al.	2509.17357	null
2025-09-22	Multi-View Attention Multiple-Instance Learning Enhanced by LLM Reasoning for Cognitive Distortion Detection	Jun Seo Kim et.al.	2509.17292	null
2025-09-21	MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference	Zheming Yang et.al.	2509.16995	null
2025-09-20	Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads	Mert Hidayetoglu et.al.	2509.16495	null
2025-09-19	LightCode: Compiling LLM Inference for Photonic-Electronic Systems	Ryan Tomich et.al.	2509.16443	null
2025-09-19	LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference	Hantao Yang et.al.	2509.15515	null
2025-09-18	A1: Asynchronous Test-Time Scaling via Conformal Prediction	Jing Xiong et.al.	2509.15148	null
2025-09-18	LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism	Yimin Wang et.al.	2509.14781	null
2025-09-18	LLM Jailbreak Detection for (Almost) Free!	Guorui Chen et.al.	2509.14558	null
2025-09-17	TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge	Zhirui Huang et.al.	2509.13765	null
2025-09-16	Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management	Thanh Son Phung et.al.	2509.13201	null
2025-09-16	HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference	Cenlin Duan et.al.	2509.12993	null
2025-09-15	Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference	Synthia Wang et.al.	2509.12152	null
2025-09-14	Framing AI System Benchmarking as a Learning Task: FlexBench and the Open MLPerf Dataset	Grigori Fursin et.al.	2509.11413	null
2025-09-14	PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits	Loka Li et.al.	2509.11362	null
2025-09-14	AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs	Santhosh G S et.al.	2509.11155	null
2025-09-12	MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness	Huizheng Wang et.al.	2509.10372	null
2025-09-11	LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation	Yiqun Shen et.al.	2509.09754	null
2025-09-11	Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference	Haoran Wu et.al.	2509.09505	null
2025-08-06	Frontier: Simulating the Next Generation of LLM Inference Systems	Yicheng Feng et.al.	2508.03148	null
2025-07-25	Cloud Native System for LLM Inference Serving	Minxian Xu et.al.	2507.18007	null
2025-07-23	BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving	Wanyi Zheng et.al.	2507.17120	null
2025-07-22	Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework	Hongyi Tang et.al.	2507.16414	null
2025-07-21	Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing	Shibo Yu et.al.	2507.15553	null
2025-07-18	Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need	Michael Davies et.al.	2507.14397	null
2025-07-18	Can LLMs Infer Personality from Real World Conversations?	Jianfeng Zhu et.al.	2507.14355	null
2025-07-23	Photonic Fabric Platform for AI Accelerators	Jing Ding et.al.	2507.14000	null
2025-07-18	LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues	Haoyang Li et.al.	2507.13681	null
2025-07-16	Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage	Junqing Lin et.al.	2507.12205	null
2025-07-15	MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving	Ruihao Li et.al.	2507.11507	null
2025-07-15	Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations	Miray Özcan et.al.	2507.11417	null
2025-07-14	Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference	Jiaming Cheng et.al.	2507.09942	null
2025-07-12	SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding	Weihong Xu et.al.	2507.09201	null
2025-07-11	On Evaluating Performance of LLM Inference Serving Systems	Amey Agrawal et.al.	2507.09019	null
2025-07-11	Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference	Chun-Ting Chen et.al.	2507.09010	null
2025-07-11	InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching	Yilun Wang et.al.	2507.08523	null
2025-07-10	Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions	Quanyan Zhu et.al.	2507.08208	null
2025-07-10	Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing	Junyi Wen et.al.	2507.08045	null
2025-07-15	Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models	Varin Sikka et.al.	2507.07505	null
2025-07-11	QUEST: Query Optimization in Unstructured Document Analysis	Zhaoze Sun et.al.	2507.06515	null
2025-07-08	Voltage Regulation in Distribution Systems with Data Center Loads	Yize Chen et.al.	2507.06416	null
2025-07-07	Cascade: Token-Sharded Private LLM Inference	Rahul Thomas et.al.	2507.05228	null
2025-07-07	Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?	Yun Qu et.al.	2507.04632	null
2025-07-05	Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States	Karine Karine et.al.	2507.03871	null
2025-07-05	OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference	Seungjun Shin et.al.	2507.03865	null
2025-07-04	Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA	Jindong Li et.al.	2507.03308	null
2025-07-03	HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference	Weishu Deng et.al.	2507.03153	null
2025-07-03	On the Convergence of Large Language Model Optimizer for Black-Box Network Management	Hoon Lee et.al.	2507.02689	null
2025-07-03	Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure	Rui Xie et.al.	2507.02654	null
2025-07-03	FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference	Xing Liu et.al.	2507.02620	null
2025-07-02	Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency	Zongpu Zhang et.al.	2507.02135	null
2025-07-02	LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation	Tianyu Liu et.al.	2507.01449	null
2025-07-02	SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech	Cheng Zhuangfei et.al.	2507.01348	null
2025-07-02	La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation	Kai Liu et.al.	2507.01299	null
2025-07-01	VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator	Zhican Wang et.al.	2507.00797	null
2025-07-01	Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models	Yilun Zhang et.al.	2507.00653	null
2025-07-01	LLM-Mesh: Enabling Elastic Sharing for Serverless LLM Inference	Chuhao Xu et.al.	2507.00507	null
2025-07-01	Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs	Mohammad Firas Sada et.al.	2507.00418	null
2025-06-30	Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission	Faranaksadat Solat et.al.	2507.00082	null
2025-06-27	QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-06-27	Towards Operational Data Analytics Chatbots -- Virtual Knowledge Graph is All You Need	Junaid Ahmed Khan et.al.	2506.22267	null
2025-06-27	SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference	Yongchao He et.al.	2506.22033	null
2025-06-30	A Survey of LLM Inference Systems	James Pan et.al.	2506.21901	null
2025-06-17	Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena et.al.	2506.20675	null
2025-07-02	Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU	He Sun et.al.	2506.20187	null
2025-06-24	MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection	Zhengxiang Huang et.al.	2506.19884	null
2025-06-23	Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation	Ahmadreza Saboor Yaraghi et.al.	2506.19045	null
2025-06-23	WiLLM: An Open Wireless LLM Communication System	Boyi Liu et.al.	2506.19030	null
2025-06-23	CommVQ: Commutative Vector Quantization for KV Cache Compression	Junyan Li et.al.	2506.18879	null
2025-06-22	Mechanistic Interpretability in the Presence of Architectural Obfuscation	Marcos Florencio et.al.	2506.18053	null
2025-06-20	Towards AI Search Paradigm	Yuchen Li et.al.	2506.17188	null
2025-06-17	CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision	Dyah Adila et.al.	2506.14912	null
2025-06-16	Vector Ontologies as an LLM world view extraction method	Kaspar Rothenfusser et.al.	2506.13252	link
2025-06-13	Semantic Scheduling for LLM Inference	Wenyue Hua et.al.	2506.12204	link
2025-06-13	GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news	Abdul Haque et.al.	2506.11600	null
2025-06-13	Collaborative LLM Inference via Planning for Efficient Reasoning	Byeongchan Lee et.al.	2506.11578	null
2025-06-13	Efficient Long-Context LLM Inference via KV Cache Clustering	Jie Hu et.al.	2506.11418	null
2025-06-12	TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference	Hongbin Zhang et.al.	2506.10470	null
2025-06-11	A First Look at Bugs in LLM Inference Engines	Mugeng Liu et.al.	2506.09713	link
2025-06-12	Understanding the Performance and Power of LLM Inferencing on Edge Accelerators	Mayank Arya et.al.	2506.09554	null
2025-06-11	Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning	Jiayi Yuan et.al.	2506.09501	null
2025-06-10	Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$	Chihiro Taguchi et.al.	2506.08479	null
2025-06-10	Draft-based Approximate Inference for LLMs	Kevin Galim et.al.	2506.08373	link
2025-06-09	MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts	Wei Tao et.al.	2506.07533	null
2025-06-07	Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation	Miryeong Kwon et.al.	2506.06769	null
2025-06-06	Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques	Adarsh Prasad Behera et.al.	2506.06579	null
2025-06-04	On the Fundamental Impossibility of Hallucination Control in Large Language Models	Michał P. Karpowicz et.al.	2506.06382	null
2025-06-04	SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling	Anhao Zhao et.al.	2506.04179	null
2025-06-04	Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation	Junyi Chen et.al.	2506.03887	null
2025-06-04	Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis	Avihay Cohen et.al.	2506.03656	null
2025-06-04	POSS: Position Specialist Generates Better Draft for Speculative Decoding	Langlin Huang et.al.	2506.03566	link
2025-06-07	Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs	Jiakun Fan et.al.	2506.03296	null
2025-06-03	Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs	Shangmin Guo et.al.	2506.02918	null
2025-06-03	HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference	Ping Gong et.al.	2506.02572	link
2025-06-02	Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts	Spencer Banasik et.al.	2506.01827	null
2025-05-30	Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching	Juan Wisznia et.al.	2505.24643	null
2025-05-30	LLM Inference Enhanced by External Knowledge: A Survey	Yu-Hsuan Lin et.al.	2505.24377	link
2025-05-30	SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference	Tian Xia et.al.	2505.24095	null
2025-05-29	Large Language Model Meets Constraint Propagation	Alexandre Bonlarron et.al.	2505.24012	null
2025-05-29	Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism	Jinhui Wei et.al.	2505.23219	null
2025-05-29	SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference	Yinghao Tang et.al.	2505.23022	null
2025-05-28	Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference	Donghyeon Joo et.al.	2505.22913	link
2025-05-28	Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference	Yue Zhu et.al.	2505.21919	null
2025-05-28	HoliTom: Holistic Token Merging for Fast Video Large Language Models	Kele Shao et.al.	2505.21334	link
2025-05-28	FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration	Daehyeon Baek et.al.	2505.20839	null
2025-05-26	HAMburger: Accelerating LLM Inference via Token Smashing	Jingyu Liu et.al.	2505.20438	null
2025-05-26	MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-26	WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	Sihan Chen et.al.	2505.19427	link
2025-05-25	DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation	Gerasimos Gerogiannis et.al.	2505.19349	null
2025-06-03	A Survey of LLM $\times$ DATA	Xuanhe Zhou et.al.	2505.18458	null
2025-05-23	An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs	Rahul Thomas et.al.	2505.18332	null
2025-05-23	NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache	Donghyun Son et.al.	2505.18231	null
2025-05-23	Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning	Michael Hassid et.al.	2505.17813	null
2025-05-23	DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies	Ning Yang et.al.	2505.17420	null
2025-05-22	RAP: Runtime-Adaptive Pruning for LLM Inference	Huanrong Liu et.al.	2505.17138	null
2025-05-22	CASTILLO: Characterizing Response Length Distributions of Large Language Models	Daniel F. Perez-Ramirez et.al.	2505.16881	link
2025-05-22	Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization	Vera Neplenbroek et.al.	2505.16467	link
2025-05-22	QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design	Benjamin Schneider et.al.	2505.16175	link
2025-05-22	KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization	Mingbo Song et.al.	2505.16162	null
2025-05-20	Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity	Susav Shrestha et.al.	2505.14884	link
2025-05-20	ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions	Bufang Yang et.al.	2505.14668	null
2025-05-20	ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs	Yifan Sui et.al.	2505.14468	null
2025-05-16	An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents	Ayesha Amjad et.al.	2505.13504	null
2025-05-19	HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding	Siran Liu et.al.	2505.13254	null
2025-05-19	FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference	Guangda Liu et.al.	2505.13109	null
2025-05-19	FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	Zihua Wang et.al.	2505.12728	link
2025-05-17	Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning	Yuheng Lu et.al.	2505.11922	null
2025-05-17	Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture	Yu Wu et.al.	2505.11916	null
2025-05-16	TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference	Raja Gond et.al.	2505.11329	link
2025-05-16	Vaiage: A Multi-Agent Solution to Personalized Travel Planning	Binwen Liu et.al.	2505.10922	null
2025-05-19	SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices	Xiangwen Zhuge et.al.	2505.10259	link
2025-05-15	ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production	Yuxing Xiang et.al.	2505.09999	link
2025-05-15	How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference	Nidhal Jegham et.al.	2505.09598	null
2025-05-14	Statistical Modeling and Uncertainty Estimation of LLM Inference Systems	Kaustabha Ray et.al.	2505.09319	null
2025-05-15	ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor	Seungbeom Choi et.al.	2505.09142	null
2025-05-13	LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries	Zekun Wu et.al.	2505.08842	null
2025-05-13	Automatic Task Detection and Heterogeneous LLM Speculative Decoding	Danying Ge et.al.	2505.08600	null
2025-05-08	Scaling Laws for Speculative Decoding	Siyuan Yan et.al.	2505.07858	null
2025-05-12	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	Hang Wu et.al.	2505.07680	null
2025-05-12	Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity	Guang Yan et.al.	2505.07239	null
2025-05-12	PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications	Kuntai Du et.al.	2505.07203	null
2025-05-14	I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference	Zibo Gao et.al.	2505.06738	null
2025-05-09	Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference	Haolin Zhang et.al.	2505.06461	null
2025-05-09	Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM	Zehao Fan et.al.	2505.05772	null
2025-05-08	HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow	You Peng et.al.	2505.05286	link
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-05	RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference	Yaoqi Chen et.al.	2505.02922	null
2025-05-03	High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers	Brian Wong et.al.	2505.01693	null
2025-05-08	A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency	Sihyeong Park et.al.	2505.01658	link
2025-05-02	PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding	Bradley McDanel et.al.	2505.01572	null
2025-04-28	AutoJudge: Judge Decoding Without Manual Annotation	Roman Garipov et.al.	2504.20039	null
2025-04-28	Taming the Titans: A Survey of Efficient LLM Inference Serving	Ranran Zhen et.al.	2504.19720	link
2025-04-28	R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference	Zhenyu Zhang et.al.	2504.19449	null
2025-05-07	A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification	Junichiro Niimi et.al.	2504.18884	link
2025-04-29	PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation	Zihao An et.al.	2504.18583	null
2025-04-25	PropRAG: Guiding Retrieval with Beam Search over Proposition Paths	Jingjin Wang et.al.	2504.18070	null
2025-04-24	L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference	Qingyuan Liu et.al.	2504.17584	null
2025-04-24	On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration	Maoyang Xiang et.al.	2504.17376	null
2025-04-18	HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing	Myunghyun Rhee et.al.	2504.16112	null
2025-04-22	Token-Aware Coding Flow: A Study with Nano Surge in Reasoning Model	Junwei Hu et.al.	2504.15989	null
2025-04-23	KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments	Junyoung Park et.al.	2504.15364	null
2025-04-18	High-Throughput LLM inference on Heterogeneous Clusters	Yi Xiong et.al.	2504.15303	null
2025-04-21	Hardware-based Heterogeneous Memory Management for Large Language Model Inference	Soojin Hwang et.al.	2504.14893	null
2025-04-19	Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator	Akshat Ramachandran et.al.	2504.14365	null
2025-04-19	FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference	Coleman Hooper et.al.	2504.14152	null
2025-04-16	Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading	Kihyun Kim et.al.	2504.11816	link
2025-04-16	Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs	Hyungwoo Lee et.al.	2504.11765	null
2025-04-16	Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures	Prabhu Vellaisamy et.al.	2504.11750	null
2025-04-15	Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints	Ruicheng Ao et.al.	2504.11320	link
2025-04-14	HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving	Avinash Kumar et.al.	2504.10724	null
2025-04-14	AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference	Yangshen Deng et.al.	2504.10326	null
2025-04-14	KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference	Yuxuan Tian et.al.	2504.09936	null
2025-04-22	Understanding and Optimizing Multi-Stage AI Inference Pipelines	Abhimanyu Rajeshkumar Bambhaniya et.al.	2504.09775	null
2025-04-13	LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference	Jianing Zheng et.al.	2504.09561	link
2025-04-12	MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Yichao Yuan et.al.	2504.09345	null
2025-04-11	SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting	Jiaming Xu et.al.	2504.08850	null
2025-04-10	SD $^2$ : Self-Distilled Sparse Drafters	Mike Lasby et.al.	2504.08838	null
2025-04-11	Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash	Fucheng Jia et.al.	2504.08378	null
2025-04-11	Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices	Shengyuan Ye et.al.	2504.08242	null
2025-04-10	Token Level Routing Inference System for Edge Devices	Jianshu She et.al.	2504.07878	null
2025-04-10	Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving	Shihong Gao et.al.	2504.07494	link
2025-04-10	UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference	Weikai Xu et.al.	2504.07479	null
2025-04-10	Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents	Yueying Li et.al.	2504.07347	null
2025-04-08	SPIRe: Boosting LLM Inference Throughput with Speculative Decoding	Sanjit Neelam et.al.	2504.06419	null
2025-04-08	Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching	Yanhao Dong et.al.	2504.06319	null
2025-04-09	Hogwild! Inference: Parallel LLM Generation via Concurrent Attention	Gleb Rodionov et.al.	2504.06261	link
2025-04-11	User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems	Jianling Wang et.al.	2504.05522	null
2025-04-07	Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness	Dongzhuoran Zhou et.al.	2504.05163	null
2025-04-04	Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Erik Johannes Husom et.al.	2504.03360	null
2025-04-04	Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation	Weitao Li et.al.	2504.03165	link
2025-04-03	Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search	Parsa Ghaffari et.al.	2504.02426	link
2025-04-01	SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching	Yuxuan Zhu et.al.	2504.00970	null
2025-04-03	Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding	Aayush Gautam et.al.	2504.00030	null
2025-04-06	ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance	Tong Xie et.al.	2503.24053	link
2025-03-31	MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration	Tatsuya Kubo et.al.	2503.23817	null
2025-03-30	Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference	Wei Tao et.al.	2503.23294	null
2025-03-28	Niyama : Breaking the Silos of LLM Inference Serving	Kanishk Goel et.al.	2503.22562	null
2025-03-25	LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation	Han Chen et.al.	2503.19950	link
2025-03-24	xKV: Cross-Layer SVD for KV-Cache Compression	Chi-Chih Chang et.al.	2503.18893	link
2025-03-27	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-03-24	Jenga: Effective Memory Management for Serving LLM with Heterogeneity	Chen Zhang et.al.	2503.18292	null
2025-03-27	WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference	Youhui Zuo et.al.	2503.17922	link
2025-03-22	PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling	Chongpeng Liu et.al.	2503.17707	null
2025-03-21	V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms	Javier J. Poveda Rodrigo et.al.	2503.17422	null
2025-03-21	Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation	Jingzhi Fang et.al.	2503.16893	null
2025-03-20	SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models	Fahao Chen et.al.	2503.15921	null
2025-03-19	Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study	Jomar Thomas Almonte et.al.	2503.15248	null
2025-03-19	Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks	Kai Zhang et.al.	2503.14882	null
2025-03-18	PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play	Wei Fang et.al.	2503.14432	null
2025-03-17	Mitigating KV Cache Competition to Enhance User Experience in LLM Inference	Haiying Shen et.al.	2503.13773	null
2025-03-17	AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications	Haiying Shen et.al.	2503.13737	null
2025-03-17	ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts	Evangelos Georganas et.al.	2503.13565	null
2025-03-14	Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce	Jingying Zeng et.al.	2503.13518	null
2025-03-17	xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference	Maximilian Beck et.al.	2503.13427	link
2025-03-17	VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding	Zeng Wang et.al.	2503.13116	null
2025-03-15	TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation	Mayank Kumar et.al.	2503.12217	null
2025-03-09	Green Prompting	Marta Adamska et.al.	2503.10666	null
2025-03-13	Collaborative Speculative Inference for Efficient LLM Inference Serving	Luyao Gao et.al.	2503.10325	null
2025-03-12	Prompt Inference Attack on Distributed Large Language Model Inference Frameworks	Xinjian Luo et.al.	2503.09291	null
2025-03-11	TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems	Feiyang Wu et.al.	2503.08415	link
2025-03-11	Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference	Pol G. Recasens et.al.	2503.08311	null
2025-03-09	Seesaw: High-throughput LLM Inference via Model Re-sharding	Qidong Su et.al.	2503.06433	null
2025-03-07	Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching	Bowen Pang et.al.	2503.05248	link
2025-03-07	SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding	Kaiyu Huang et.al.	2503.05096	null
2025-03-15	Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking	Yijie Xu et.al.	2503.04636	null
2025-03-06	AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services	Xiaoqi Wang et.al.	2503.04418	null
2025-03-06	Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search	Kou Misaki et.al.	2503.04412	null
2025-03-06	Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets	Yiwen Dong et.al.	2503.04076	null
2025-03-04	FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference	Hongchao Du et.al.	2503.03777	null
2025-03-05	MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems	Rui Ye et.al.	2503.03686	null
2025-03-04	VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference	Zihan Liu et.al.	2503.02236	null
2025-02-26	Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis	Long Cheng et.al.	2503.01873	null
2025-03-03	SAGE: A Framework of Precise Retrieval for RAG	Jintao Zhang et.al.	2503.01713	null
2025-03-03	DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems	Minoo Hosseinzadeh et.al.	2503.01704	null
2025-03-01	Tutorial Proposal: Speculative Decoding for Efficient LLM Inference	Heming Xia et.al.	2503.00491	null
2025-02-28	FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference	Xunhao Lai et.al.	2502.20766	link
2025-02-28	SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models	Han-Byul Kim et.al.	2502.20727	null
2025-02-27	ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving	Kai Mei et.al.	2502.20576	link
2025-02-26	Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs	Yiheng Yang et.al.	2502.19078	null
2025-02-24	LLM Inference Acceleration via Efficient Operation Fusion	Mahsa Salmani et.al.	2502.17728	null
2025-02-24	CodeSwift: Accelerating LLM Inference for Efficient Code Generation	Qianhui Zhao et.al.	2502.17139	null
2025-02-24	Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM	Lian Liu et.al.	2502.16963	null
2025-02-24	DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance	Xuanfan Ni et.al.	2502.16886	null
2025-03-01	CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter	Yepeng Weng et.al.	2502.16880	null
2025-02-23	DISC: Dynamic Decomposition Improves LLM Inference Scaling	Jonathan Light et.al.	2502.16706	null
2025-02-23	TerEffic: Highly Efficient Ternary LLM Inference on FPGA	Chenyang Yin et.al.	2502.16473	null
2025-02-21	KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse	Jingbo Yang et.al.	2502.16002	link
2025-02-21	Towards Swift Serverless LLM Cold Starts with ParaServe	Chiheng Lou et.al.	2502.15524	null
2025-02-24	HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings	Rasmus Aavang et.al.	2502.15411	link
2025-02-24	Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference	Yaohua Tang et.al.	2502.15294	null
2025-02-21	A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation	Shilong Hou et.al.	2502.15233	link
2025-02-19	EvoP: Robust LLM Inference via Evolutionary Pruning	Shangyu Wu et.al.	2502.14910	null
2025-02-20	Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale	Shashwat Jaiswal et.al.	2502.14617	null
2025-02-20	SR-LLM: Rethinking the Structured Representation in Large Language Model	Jiahuan Zhang et.al.	2502.14352	null
2025-02-19	RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression	Payman Behnam et.al.	2502.14051	null
2025-02-19	Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference	Qingfa Xiao et.al.	2502.13542	null
2025-02-19	What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis	Peiran Wang et.al.	2502.13490	null
2025-02-18	BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference	Ahmed Burak Gulhan et.al.	2502.13176	null
2025-02-18	R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs	Sumin Jo et.al.	2502.12767	link
2025-02-18	HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading	Cheng Luo et.al.	2502.12574	link
2025-02-18	Distributed On-Device LLM Inference With Over-the-Air Computation	Kai Zhang et.al.	2502.12559	null
2025-02-18	SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs	Ahmed F. AbouElhamayed et.al.	2502.12444	link
2025-02-17	Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs	Kan Zhu et.al.	2502.12216	null
2025-02-17	Designing Role Vectors to Improve LLM Inference Behaviour	Daniele Potertì et.al.	2502.12055	null
2025-02-17	DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services	Ting Sun et.al.	2502.11417	null
2025-02-17	Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment	Ben Dong et.al.	2502.11347	null
2025-02-16	Diversified Sampling Improves Scaling LLM inference	Tianchun Wang et.al.	2502.11027	null
2025-02-16	Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings	Liangqi Yuan et.al.	2502.11007	link
2025-02-15	Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA	Jindong Li et.al.	2502.10659	null
2025-02-14	λScale: Enabling Fast Scaling for Serverless Large Language Model Inference	Minchen Yu et.al.	2502.09922	null
2025-02-14	INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing	Hongsun Jang et.al.	2502.09921	null
2025-02-13	On multi-token prediction for efficient LLM inference	Somesh Mehra et.al.	2502.09419	null
2025-02-13	InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU	Heejun Lee et.al.	2502.08910	null
2025-02-12	Universal Model Routing for Efficient LLM Inference	Wittawat Jitkrittum et.al.	2502.08773	null
2025-02-12	Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences	Shanshan Han et.al.	2502.08142	null
2025-02-11	HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment	Youhe Jiang et.al.	2502.07903	null
2025-02-11	SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters	Yiping Wang et.al.	2502.07832	null
2025-02-11	PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference	Yufeng Gu et.al.	2502.07578	link
2025-02-13	Online Scheduling for LLM Inference with KV Cache Constraints	Patrick Jaillet et.al.	2502.07115	null
2025-02-08	Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models	Soham Poddar et.al.	2502.05610	null
2025-02-08	Mechanistic Interpretability of Emotion Inference in Large Language Models	Ala N. Tak et.al.	2502.05489	null
2025-02-07	BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference	Reena Elangovan et.al.	2502.05376	null
2025-02-07	LLM Query Scheduling with Prefix Reuse and Latency Constraints	Gregory Dexter et.al.	2502.04677	null
2025-02-06	WaferLLM: A Wafer-Scale LLM Inference System	Congjie He et.al.	2502.04563	null
2025-02-06	KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference	Xing Li et.al.	2502.04420	link
2025-02-06	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei et.al.	2502.04416	link
2025-02-06	AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference	Qingyue Yang et.al.	2502.04077	link
2025-02-06	Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective	Yuan Feng et.al.	2502.03805	link
2025-02-06	Adaptive Semantic Prompt Caching with VectorQ	Luis Gaspar Schroeder et.al.	2502.03771	null
2025-02-05	HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference	Zeyu Zhang et.al.	2502.03589	null
2025-02-05	Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL	Wenbo Sun et.al.	2502.02818	null
2025-02-05	Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Jingyu Liu et.al.	2502.02789	link
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-01-30	Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency	Sazzad Hossain et.al.	2502.01651	null
2025-02-06	An Investigation of FP8 Across Accelerators for LLM Inference	Jiwoo Kim et.al.	2502.01070	null
2025-02-02	Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference	Patrick Yubeaton et.al.	2502.00922	null
2025-02-02	SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models	Jiawen Zhang et.al.	2502.00847	null
2025-02-01	UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs	Yizhe Xiong et.al.	2502.00439	null
2025-02-01	ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference	Xiang Liu et.al.	2502.00299	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-02-02	RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations	Zunhai Su et.al.	2501.16383	link
2025-01-27	Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs	Antony Bartlett et.al.	2501.16191	null
2025-01-27	TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference	Jack Min Ong et.al.	2501.16007	null
2025-01-27	Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference	Tharindu B. Hewage et.al.	2501.15829	link
2025-01-25	Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads	Xingyang He et.al.	2501.15113	null
2025-01-24	Locality-aware Fair Scheduling in LLM Serving	Shiyi Cao et.al.	2501.14312	null
2025-01-20	Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference	Pouya Hamadanian et.al.	2501.11779	link
2025-01-20	Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas	Nishant Balepur et.al.	2501.11549	link
2025-01-19	GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation	Shashikant Ilager et.al.	2501.11006	link
2025-01-17	A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks	Xinzhe Li et.al.	2501.10069	link
2025-01-17	PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks	Huiyou Zhan et.al.	2501.09367	null
2025-01-16	Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition	Takaaki Hori et.al.	2501.09258	null
2025-01-15	Guiding Retrieval using LLM-based Listwise Rankers	Mandeep Rathee et.al.	2501.09186	link
2025-01-14	Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Paul Joe Maliakel et.al.	2501.08219	null
2025-01-14	PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving	Ahmet Caner Yüzügüler et.al.	2501.08192	null
2025-01-14	Hierarchical Autoscaling for Large Language Model Serving with Chiron	Archit Patke et.al.	2501.08090	null
2025-01-12	MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference	Wenxuan Zeng et.al.	2501.06807	null
2025-01-05	TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms	Jovan Stojkovic et.al.	2501.02600	null
2025-01-04	AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference	Zhuomin He et.al.	2501.02336	link
2025-01-03	Efficient LLM Inference with Activation Checkpointing and Hybrid Caching	Sanghyeon Lee et.al.	2501.01792	null
2025-01-03	BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference	Wonsuk Jang et.al.	2501.01144	link
2025-01-02	FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Zihao Ye et.al.	2501.01005	link
2024-12-23	Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs	Dibakar Gope et.al.	2501.00032	link
2024-12-29	TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication	Zongwu Wang et.al.	2412.20501	link
2024-12-28	LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System	Hyucksung Kwon et.al.	2412.20166	null
2024-12-19	GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors	Chengming Zhang et.al.	2412.19829	null
2025-01-02	A Survey on Large Language Model Acceleration based on KV Cache Management	Haoyang Li et.al.	2412.19442	link
2024-12-27	An Engorgio Prompt Makes Large Language Model Babble on	Jianshuo Dong et.al.	2412.19394	link
2024-12-25	Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Libo Zhang et.al.	2412.18934	null
2024-12-21	SYMPHONY: Improving Memory Management for LLM Inference Workloads	Saurabh Agarwal et.al.	2412.16434	null
2024-12-20	WebLLM: A High-Performance In-Browser LLM Inference Engine	Charlie F. Ruan et.al.	2412.15803	link
2024-12-18	A Survey on LLM Inference-Time Self-Improvement	Xiangjue Dong et.al.	2412.14352	link
2024-12-18	Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models	Seungeun Oh et.al.	2412.12687	null
2024-12-17	A System for Microserving of LLMs	Hongyi Jin et.al.	2412.12488	null
2024-12-16	CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation	Hongxuan Zhang et.al.	2412.11741	null
2024-12-15	Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning	Yun Qu et.al.	2412.11120	link
2024-12-15	NITRO: LLM Inference on Intel Laptop NPUs	Anthony Fei et.al.	2412.11053	link
2024-12-13	SCBench: A KV Cache-Centric Analysis of Long-Context Methods	Yucheng Li et.al.	2412.10319	null
2024-12-17	TurboAttention: Efficient Attention Approximation For High Throughputs LLMs	Hao Kang et.al.	2412.08585	null
2024-12-11	Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths	Naryeong Kim et.al.	2412.08281	null
2024-12-12	TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch	Xingchen Song et.al.	2412.08237	null
2024-12-09	Asynchronous LLM Function Calling	In Gim et.al.	2412.07017	null
2024-12-09	SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs	James Vo et.al.	2412.06198	null
2024-12-08	XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference	Weizhuo Li et.al.	2412.05896	null
2024-12-06	GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments	Yanyu Chen et.al.	2412.04788	null
2024-12-03	Multi-Bin Batching for Increasing LLM Inference Throughput	Ozgur Guldogan et.al.	2412.04504	null
2024-11-29	BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching	Zhen Zheng et.al.	2412.03594	null
2024-12-03	Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity	Da Ma et.al.	2412.02252	null
2024-12-02	PLD+: Accelerating LLM inference by leveraging Language Model Artifacts	Shwetha Somasundaram et.al.	2412.01447	null
2024-12-02	Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking	Marco Federici et.al.	2412.01380	null
2024-12-05	RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy	Geonho Lee et.al.	2412.01129	link
2024-12-02	TruncFormer: Private LLM Inference Using Only Truncations	Patrick Yubeaton et.al.	2412.01042	null
2024-11-29	A dynamic parallel method for performance optimization on hybrid CPUs	Luo Yu et.al.	2411.19542	null
2024-12-03	Puzzle: Distillation-Based NAS for Inference-Optimized LLMs	Akhiad Bercovich et.al.	2411.19146	null
2024-11-29	InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks	Xinyao Zheng et.al.	2411.18191	null
2024-11-28	MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache	Akshat Sharma et.al.	2411.18077	null
2024-11-24	Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou et.al.	2411.17741	null
2024-11-26	PIM-AI: A Novel Architecture for High-Efficiency LLM Inference	Cristobal Ortega et.al.	2411.17309	null
2024-11-26	Star Attention: Efficient LLM Inference over Long Sequences	Shantanu Acharya et.al.	2411.17116	link
2024-11-26	Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation	Chaoyi Jiang et.al.	2411.17089	link
2024-11-25	MixPE: Quantization and Hardware Co-design for Efficient LLM Inference	Yu Zhang et.al.	2411.16158	null
2024-11-24	eFedLLM: Efficient LLM Inference Based on Federated Learning	Shengwen Ding et.al.	2411.16003	null
2024-11-24	Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format	Chao Fang et.al.	2411.15982	null
2024-11-24	Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems	Wenxiang Lin et.al.	2411.15715	null
2024-11-22	XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	Yixin Dong et.al.	2411.15100	null
2024-11-21	Disentangling Memory and Reasoning Ability in Large Language Models	Mingyu Jin et.al.	2411.13504	link
2024-11-20	Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding	Hyun Ryu et.al.	2411.13157	null
2024-11-21	LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts	Zhuohan Gu et.al.	2411.13009	null
2024-11-15	An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2	Pepijn de Reus et.al.	2411.12758	link
2024-11-19	SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference	Jiho Shin et.al.	2411.12692	null
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-15	AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference	Janghwan Lee et.al.	2411.09909	null
2024-11-14	Squeezed Attention: Accelerating Long Context Length LLM Inference	Coleman Hooper et.al.	2411.09688	link
2024-11-15	Communication Compression for Tensor Parallel LLM Inference	Jan Hansen-Palmus et.al.	2411.09510	null
2024-11-14	Pie: Pooling CPU Memory for LLM Inference	Yi Xu et.al.	2411.09317	null
2024-11-12	Towards Low-bit Communication for Tensor Parallel LLM Inference	Harry Dong et.al.	2411.07942	null
2024-11-12	The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving	Kyoungmin Kim et.al.	2411.07447	null
2024-11-08	AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality	Ilias Bournias et.al.	2411.05555	null
2024-11-07	Hardware and Software Platform Inference	Cheng Zhang et.al.	2411.05197	null
2024-11-07	SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference	Gabriele Oliaro et.al.	2411.04975	link
2024-11-05	CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration	Hongpeng Jin et.al.	2411.02829	null
2024-11-04	RAGViz: Diagnose and Visualize Retrieval-Augmented Generation	Tevin Wang et.al.	2411.01751	link
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-11-02	RA-WEBs: Remote Attestation for WEB services	Kosei Akama et.al.	2411.01340	null
2024-11-02	NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference	Xuanlin Jiang et.al.	2411.01142	null
2024-11-01	LLM-Based Misconfiguration Detection for AWS Serverless Computing	Jinfeng Wen et.al.	2411.00642	null
2024-11-04	ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models	Anbang Wang et.al.	2411.00533	null
2024-11-01	Attention Tracker: Detecting Prompt Injection Attacks in LLMs	Kuo-Han Hung et.al.	2411.00348	null
2024-10-31	LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators	Krishna Teja Chitty-Venkata et.al.	2411.00136	link
2024-10-31	Interpretable Language Modeling via Induction-head Ngram Models	Eunji Kim et.al.	2411.00066	link
2024-10-31	ALISE: Accelerating Large Language Model Serving with Speculative Scheduling	Youpeng Zhao et.al.	2410.23537	null
2024-10-30	BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference	Junqi Zhao et.al.	2410.23079	link
2024-10-29	Scaling LLM Inference with Optimized Sample Compute Allocation	Kexun Zhang et.al.	2410.22480	link
2024-10-29	SVIP: Towards Verifiable Inference of Open-source Large Language Models	Yifan Sun et.al.	2410.22307	null
2024-10-28	ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference	Hanshi Sun et.al.	2410.21465	link
2024-10-27	FIRP: Faster LLM inference via future intermediate representation prediction	Pengfei Wu et.al.	2410.20488	null
2024-10-29	Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management	Tuowei Wang et.al.	2410.19274	null
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-30	Dynamic Vocabulary Pruning in Early-Exit LLMs	Jort Vincenti et.al.	2410.18952	link
2024-10-25	A Survey on Speech Large Language Models	Jing Peng et.al.	2410.18908	null
2024-10-24	BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching	Peizhuang Cong et.al.	2410.18701	null
2024-10-25	Fast Inference for Augmented Large Language Models	Rana Shahout et.al.	2410.18248	null
2024-10-23	POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference	Aditya K Kamath et.al.	2410.18038	link
2024-10-22	FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs	Haoran Lin et.al.	2410.16663	null
2024-10-22	Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency	Prafulla Kumar Choubey et.al.	2410.16597	null
2024-10-20	EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models	Junhao Hu et.al.	2410.15332	null
2024-10-19	IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System	Minseok Seo et.al.	2410.15008	null
2024-10-23	Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching	Jie Peng et.al.	2410.14740	null
2024-10-18	A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference	You Wu et.al.	2410.14442	link
2024-10-18	Revisiting SLO and Goodput Metrics in LLM Serving	Zhibin Wang et.al.	2410.14257	null
2024-10-17	RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs	Jiatan Huang et.al.	2410.13987	null
2024-10-17	Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs	Tianyu Guo et.al.	2410.13835	link
2024-10-17	Progressive Mixed-Precision Decoding for Efficient LLM Inference	Hao Mark Chen et.al.	2410.13461	null
2024-10-17	Data Defenses Against Large Language Models	William Agnew et.al.	2410.13138	link
2024-10-19	In-context KV-Cache Eviction for LLMs via Attention-Gate	Zihao Zeng et.al.	2410.12876	null
2024-10-10	RecurFormer: Not All Transformer Heads Need Self-Attention	Ruiqing Yan et.al.	2410.12850	null
2024-10-16	Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Huiwen Wu et.al.	2410.12130	null
2024-10-15	Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix	Yingyu Liang et.al.	2410.11261	null
2024-10-14	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads	Guangxuan Xiao et.al.	2410.10819	link
2024-10-16	SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization	Akrit Mudvari et.al.	2410.10759	null
2024-10-12	Power-Softmax: Towards Secure LLM Inference over Encrypted Data	Itamar Zimerman et.al.	2410.09457	null
2024-10-09	SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration	Heming Xia et.al.	2410.06916	link
2024-10-08	ParallelSpec: Parallel Drafter for Efficient Speculative Decoding	Zilin Xiao et.al.	2410.05589	null
2024-10-06	RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference	Yige Xu et.al.	2410.04519	link
2024-10-14	Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective	Jinhao Li et.al.	2410.04466	link
2024-10-04	SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation	Aurick Qiao et.al.	2410.03960	null
2024-10-04	EXAQ: Exponent Aware Quantization For LLMs Acceleration	Moran Shkolnik et.al.	2410.03185	link
2024-10-03	LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences	Zhenxiao Fu et.al.	2410.02950	null
2024-10-03	Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration	Yun Qu et.al.	2410.02511	link
2024-10-04	LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services	Małgorzata Łazuka et.al.	2410.02425	null
2024-10-04	Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation	Xiaoqun Liu et.al.	2410.02220	null
2024-10-02	Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads	Yuxiang Huang et.al.	2410.01805	link
2024-10-02	ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving	Yifan Qiao et.al.	2410.01228	null
2024-10-02	TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices	Zonghang Li et.al.	2410.00531	null
2024-09-30	The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems	Linke Song et.al.	2409.20002	null
2024-09-26	Control Industrial Automation System with Large Language Models	Yuchen Xia et.al.	2409.18009	link
2024-09-26	Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores	Shaobo Ma et.al.	2409.17870	null
2024-09-25	Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction	Zhenmei Shi et.al.	2409.17422	link
2024-09-25	Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations	Amey Agrawal et.al.	2409.17264	null
2024-09-25	Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference	Zongyue Qin et.al.	2409.16560	null
2024-09-25	AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization	Yifan Tan et.al.	2409.16546	link
2024-09-23	Eagle: Efficient Training-Free Router for Multi-LLM Inference	Zesen Zhao et.al.	2409.15518	null
2024-09-24	UELLM: A Unified and Efficient Approach for LLM Inference Serving	Yiyuan He et.al.	2409.14961	null
2024-09-22	RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph	Linxi Wei et.al.	2409.14556	null
2024-09-16	Do Large Language Models Need a Content Delivery Network?	Yihua Cheng et.al.	2409.13761	link
2024-09-19	PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)	Mahmoud Nazzal et.al.	2409.12699	link
2024-09-12	LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs	Han Xu et.al.	2409.11424	null
2024-09-04	ISO: Overlap of Computation and Communication within Seqenence For LLM Inference	Bin Xiao et.al.	2409.11155	null
2024-09-18	RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	Di Liu et.al.	2409.10516	link
2024-09-08	InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference	Xiurui Pan et.al.	2409.04992	null
2024-09-07	Achieving Peak Performance for Large Language Models: A Systematic Review	Zhyar Rzgar K Rostam et.al.	2409.04833	null
2024-09-06	A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage	Huan Yang et.al.	2409.04040	null
2024-09-13	Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study	Jianwei Zhu et.al.	2409.03992	null
2024-09-05	Sirius: Contextual Sparsity with Correction for Efficient LLMs	Yang Zhou et.al.	2409.03856	link
2024-08-31	HSF: Defending against Jailbreak Attacks with Hidden State Filtering	Cheng Qian et.al.	2409.03788	null
2024-09-03	Contemporary Model Compression on Large Language Models Inference	Dong Liu et.al.	2409.01990	link
2024-09-02	CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification	Junhui He et.al.	2409.01366	link
2024-09-04	Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference	Barys Liskavets et.al.	2409.01227	link
2024-09-01	Research on LLM Acceleration Using the High-Performance RISC-V Processor "Xiangshan" (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product)	Xu-Hao Chen et.al.	2409.00661	null
2024-08-28	Decentralized LLM Inference over Edge Networks with Energy Harvesting	Aria Khoshsirat et.al.	2408.15907	null
2024-08-28	Efficient LLM Scheduling by Learning to Rank	Yichao Fu et.al.	2408.15792	link
2024-08-28	Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation	Lujun Gui et.al.	2408.15562	null
2024-08-22	NanoFlow: Towards Optimal Large Language Model Serving Throughput	Kan Zhu et.al.	2408.12757	link
2024-09-04	Parallel Speculative Decoding with Adaptive Draft Length	Tianyu Liu et.al.	2408.11850	link
2024-08-21	MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Elias Frantar et.al.	2408.11743	link
2024-08-20	Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models	Artem Vazhentsev et.al.	2408.10692	null
2024-08-19	PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars	Sumanth Prabhu et.al.	2408.08869	null
2024-08-23	ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models	Chao Zeng et.al.	2408.08554	link
2024-08-14	LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference	Seungjae Moon et.al.	2408.07326	null
2024-08-12	LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration	Zhiwen Mo et.al.	2408.06003	null
2024-08-10	LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale	Jaehong Cho et.al.	2408.05499	link
2024-08-05	SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving	Andreas Kosmas Kakolyris et.al.	2408.05235	null
2024-08-08	Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning	Ke Cheng et.al.	2408.04323	null
2024-08-07	Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference	Zeyu Zhang et.al.	2408.04107	null
2024-08-07	MPC-Minimized Secure LLM Inference	Deevashwer Rathee et.al.	2408.03561	null
2024-08-05	Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning	Hao Zhou et.al.	2408.02549	null
2024-08-02	The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines	Matias Martinez et.al.	2408.01050	null
2024-08-01	DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency	Jovan Stojkovic et.al.	2408.00741	null
2024-08-01	Designing Efficient LLM Accelerators for Edge Devices	Jude Haris et.al.	2408.00462	null
2024-08-01	Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control	Hao Zhou et.al.	2408.00214	null
2024-07-23	ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency	Yuhang Yao et.al.	2408.00008	null
2024-08-01	Responsive ML inference in multi-tenanted environments using AQUA	Abhishek Vijaya Kumar et.al.	2407.21255	null
2024-07-25	An Efficient Inference Framework for Early-exit Large Language Models	Ruijie Miao et.al.	2407.20272	null
2024-07-29	Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Sania Nayab et.al.	2407.19825	null
2024-07-29	Teaching LLMs at Charles University: Assignments and Activities	Jindřich Helcl et.al.	2407.19798	null
2024-07-22	RazorAttention: Efficient KV Cache Compression Through Retrieval Heads	Hanlin Tang et.al.	2407.15891	null
2024-07-22	vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving	Jiale Xu et.al.	2407.15309	link
2024-07-19	LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference	Qichen Fu et.al.	2407.14057	null
2024-07-17	Struct-X: Enhancing Large Language Models Reasoning with Structured Data	Xiaoyu Tan et.al.	2407.12522	null
2024-07-17	LLM Inference Serving: Survey of Recent Advances and Opportunities	Baolin Li et.al.	2407.12391	null
2024-07-17	Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models	Ayush Kaushal et.al.	2407.12327	link
2024-07-16	PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	Branden Butler et.al.	2407.11798	null
2024-07-21	Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference	Yuan Feng et.al.	2407.11550	link
2024-07-15	Fast Matrix Multiplications for Lookup Table-Quantized LLMs	Han Guo et.al.	2407.10960	link
2024-07-12	Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference	Zongyue Qin et.al.	2407.09722	null
2024-09-02	Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems	Amey Agrawal et.al.	2407.07000	null
2024-07-08	Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU	Daliang Xu et.al.	2407.05858	link
2024-07-07	A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length	Yuqing Yang et.al.	2407.05347	null
2024-07-05	Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design	Yiyang Huang et.al.	2407.04292	link
2024-07-04	Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems	Grant Wilkins et.al.	2407.04014	null
2024-07-02	MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Huiqiang Jiang et.al.	2407.02490	link
2024-06-29	Teola: Towards End-to-End Optimization of LLM-based Applications	Xin Tan et.al.	2407.00326	link
2024-06-25	T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge	Jianyu Wei et.al.	2407.00088	link
2024-06-28	InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management	Wonbeom Lee et.al.	2406.19707	null
2024-06-24	Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters	Euiin Yi et.al.	2406.16758	link
2024-06-28	SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention	Qianchao Zhu et.al.	2406.15486	null
2024-06-21	Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models	Qi Liu et.al.	2406.14848	link
2024-06-20	Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data	Johannes Treutlein et.al.	2406.14546	link
2024-06-20	LiveMind: Low-latency Large Language Models with Simultaneous Inference	Chuangtao Chen et.al.	2406.14319	link
2024-06-19	SDQ: Sparse Decomposed Quantization for LLM Inference	Geonhwa Jeong et.al.	2406.13868	null
2024-06-19	Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style	Zeping Li et.al.	2406.13170	null
2024-06-16	Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization	Jungi Lee et.al.	2406.12930	null
2024-06-18	LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization	Masafumi Enomoto et.al.	2406.12494	null
2024-06-18	LLMs Are Prone to Fallacies in Causal Inference	Nitish Joshi et.al.	2406.12158	null
2024-06-14	Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning	Hui Liu et.al.	2406.11890	null
2024-06-17	Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference	Donghyeon Joo et.al.	2406.11674	null
2024-06-17	QTIP: Quantization with Trellises and Incoherence Processing	Albert Tseng et.al.	2406.11235	link
2024-06-16	New Solutions on LLM Acceleration, Optimization, and Application	Yingbing Huang et.al.	2406.10903	null
2024-06-16	Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference	Jiaming Tang et.al.	2406.10774	link
2024-06-15	Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study	Hao Hao et.al.	2406.10675	link
2024-06-08	QCQA: Quality and Capacity-aware grouped Query Attention	Vinay Joshi et.al.	2406.10247	null
2024-06-12	Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference	Christopher Wolters et.al.	2406.08413	null
2024-06-12	PowerInfer-2: Fast Large Language Model Inference on a Smartphone	Zhenliang Xue et.al.	2406.06282	null
2024-06-09	A Superalignment Framework in Autonomous Driving with Large Language Models	Xiangrui Kong et.al.	2406.05651	null
2024-06-06	Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism	Jiahao Liu et.al.	2406.03853	null
2024-06-04	Language Models can Infer Action Semantics for Classical Planners from Environment Feedback	Wang Zhu et.al.	2406.02791	null
2024-06-08	Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach	Yuxuan Chen et.al.	2406.02616	null
2024-06-04	SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices	Ruslan Svirschevski et.al.	2406.02532	link
2024-06-03	Demystifying Platform Requirements for Diverse LLM Inference Use Cases	Abhimanyu Bambhaniya et.al.	2406.01698	link
2024-06-03	PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration	Ziqian Zeng et.al.	2406.01394	null
2024-06-01	A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation	Dugang Liu et.al.	2406.00333	null
2024-05-31	No Free Lunch Theorem for Privacy-Preserving LLM Inference	Xiaojin Zhang et.al.	2405.20681	null
2024-05-30	Decentralized AI: Permissionless LLM Inference on POKT Network	Daniel Olshansky et.al.	2405.20450	null
2024-06-01	S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs	Wei Zhong et.al.	2405.20314	null
2024-05-30	Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models	Yuxiao Luo et.al.	2405.19850	null
2024-05-29	MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models	Taehyun Kim et.al.	2405.18832	null
2024-05-29	PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN	Fei Zheng et.al.	2405.18744	null
2024-06-02	Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference	Hao Mark Chen et.al.	2405.18628	link
2024-05-25	FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference	Chenqi Lin et.al.	2405.16241	null
2024-05-23	EdgeShard: Efficient LLM Inference via Collaborative Edge Computing	Mingjin Zhang et.al.	2405.14371	null
2024-05-23	MiniCache: KV Cache Compression in Depth Dimension for Large Language Models	Akide Liu et.al.	2405.14366	null
2024-05-21	PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference	Dongjie Yang et.al.	2405.12532	null
2024-05-12	Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization	Xinyuan Zhang et.al.	2405.07140	null
2024-05-11	Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving	Chengyi Nie et.al.	2405.06856	null
2024-05-21	Vidur: A Large-Scale Simulation Framework For LLM Inference	Amey Agrawal et.al.	2405.05465	link
2024-05-13	KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation	Minsik Cho et.al.	2405.05329	null
2024-05-12	DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature	Dawei Li et.al.	2405.04819	link
2024-05-10	QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving	Yujun Lin et.al.	2405.04532	link
2024-05-07	vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention	Ramya Prabhu et.al.	2405.04437	link
2024-05-07	Optimizing Language Model's Reasoning Abilities with Weak Supervision	Yongqi Tong et.al.	2405.04086	null
2024-05-06	AlphaMath Almost Zero: process Supervision without process	Guoxin Chen et.al.	2405.03553	link
2024-05-03	Efficient and Economic Large Language Model Inference with Attention Offloading	Shaoyuan Chen et.al.	2405.01814	null

(back to top)

MoE

Publish Date	Title	Authors	PDF	Code
2026-03-06	RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering	Gaia A. Bertolino et.al.	2603.06542	null
2026-03-06	A Mixture-of-Experts Framework for Practical Hybrid-Quantum Models in Credit Card Fraud Detection	Rodrigo Chaves et.al.	2603.06473	null
2026-03-06	MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis	Dongqing Xie et.al.	2603.06378	null
2026-03-06	MoEless: Efficient MoE LLM Serving via Serverless Computing	Hanfei Yu et.al.	2603.06350	null
2026-03-06	WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection	Peng Chen et.al.	2603.06313	null
2026-03-06	GazeMoE: Perception of Gaze Target with Mixture-of-Experts	Zhuangzhuang Dai et.al.	2603.06256	null
2026-03-06	EvoESAP: Non-Uniform Expert Pruning for Sparse MoE	Zongfang Liu et.al.	2603.06003	null
2026-03-06	MoE Lens -- An Expert Is All You Need	Marmik Chaudhari et.al.	2603.05806	null
2026-03-06	Sparse Crosscoders for diffing MoEs and Dense models	Marmik Chaudhari et.al.	2603.05805	null
2026-03-05	Change Point Detection for Cell Populations Measured via Flow Cytometry	Yik Lun Kei et.al.	2603.05700	null
2026-03-05	NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension	Rongzhi Li et.al.	2603.05046	null
2026-03-05	Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation	Yilong Chen et.al.	2603.04971	null
2026-03-05	Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling	Yong Liu et.al.	2603.04791	null
2026-03-05	TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings	Yebo Wu et.al.	2603.04772	null
2026-03-04	ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model	Yuhao Xu et.al.	2603.04589	null
2026-03-04	Augmenting representations with scientific papers	Nicolò Oreste Pinciroli Vago et.al.	2603.04516	null
2026-03-04	Benchmarking Quantum Computers via Protocols, Comparing IBM's Heron vs IBM's Eagle	Nitay Mayo et.al.	2603.04377	null
2026-03-04	RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation	Yixin Chen et.al.	2603.04348	null
2026-03-04	CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation	Jinfeng Xu et.al.	2603.04320	null
2026-03-04	UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization	Qianfeng Yang et.al.	2603.03967	null
2026-03-03	Modeling Cross-vision Synergy for Unified Large Vision Model	Shengqiong Wu et.al.	2603.03564	null
2026-03-03	Beyond Language Modeling: An Exploration of Multimodal Pretraining	Shengbang Tong et.al.	2603.03276	null
2026-03-04	MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection	Jun Yeong Park et.al.	2603.03101	null
2026-03-03	CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots	Shihao Ma et.al.	2603.03067	null
2026-03-03	EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education	Baoliang Chen et.al.	2603.03066	null
2026-03-03	Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs	Wuyue Zhang et.al.	2603.02731	null
2026-03-03	TenExp: Mixture-of-Experts-Based Tensor Decomposition Structure Search Framework	Ting-Wei Zhou et.al.	2603.02720	null
2026-03-03	MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration	Lingshun Kong et.al.	2603.02710	null
2026-03-03	Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data	Sijie Mai et.al.	2603.02695	null
2026-03-03	Robust Heterogeneous Analog-Digital Computing for Mixture-of-Experts Models with Theoretical Generalization Guarantees	Mohammed Nowaz Rabbani Chowdhury et.al.	2603.02633	null
2026-03-02	DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks	Gökdeniz Gülmez et.al.	2603.01697	null
2026-03-02	PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification	Jian Yu et.al.	2603.01547	null
2026-03-02	Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification	Jiayang Wu et.al.	2603.01511	null
2026-03-02	UETrack: A Unified and Efficient Framework for Single Object Tracking	Ben Kang et.al.	2603.01412	null
2026-03-02	Fed-GAME: Personalized Federated Learning with Graph Attention Mixture-of-Experts For Time-Series Forecasting	Yi Li et.al.	2603.01363	null
2026-03-01	Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning	Hamed Damirchi et.al.	2603.01326	null
2026-03-01	TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading	Yudong Pan et.al.	2603.01058	null
2026-03-01	Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving	Xubo Zhu et.al.	2603.01007	null
2026-02-28	MME: Mixture of Mesh Experts with Random Walk Transformer Gating	Amir Belder et.al.	2603.00828	null
2026-02-27	Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization	Chenwei Jia et.al.	2602.24059	null
2026-02-26	Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG	Hanning Guo et.al.	2602.23410	null
2026-02-26	A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations	Soumya Dutta et.al.	2602.23300	null
2026-02-26	Learning Physical Operators using Neural Operators	Vignesh Gopakumar et.al.	2602.23113	null
2026-02-26	pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation	Shentong Mo et.al.	2602.22938	null
2026-02-26	Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting	Fabian Muşat et.al.	2602.22685	null
2026-02-26	Predictive variational inference for flexible regression models	Lucas Kock et.al.	2602.22582	null
2026-02-25	NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training	Dengdi Sun et.al.	2602.22059	null
2026-02-25	Excitation: Momentum For Experts	Sagi Shaier et.al.	2602.21798	null
2026-02-25	Learning from Yesterday's Error: An Efficient Online Learning Method for Traffic Demand Prediction	Xiannan Huang et.al.	2602.21757	null
2026-02-25	TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts	Jiafeng Lin et.al.	2602.21693	null
2026-02-25	Multi-Layer Scheduling for MoE-Based LLM Reasoning	Yifan Sun et.al.	2602.21626	null
2026-02-24	Dual-Branch INS/GNSS Fusion with Inequality and Equality Constraints	Mor Levenhar et.al.	2602.21266	null
2026-02-25	GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer	Wenbo Yu et.al.	2602.20871	null
2026-02-24	Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA	Nuocheng Yang et.al.	2602.20492	null
2026-02-23	The Universal Eccentricity Distribution for Dynamical Gravitational-Wave Merger Channels	Mor Rozner et.al.	2602.20110	null
2026-02-23	Counterfactual Understanding via Retrieval-aware Multimodal Modeling for Time-to-Event Survival Prediction	Ha-Anh Hoang Nguyen et.al.	2602.19987	null
2026-02-23	A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs	Zijie Liu et.al.	2602.19938	null
2026-02-23	Towards Dexterous Embodied Manipulation via Deep Multi-Sensory Fusion and Sparse Expert Scaling	Yirui Sun et.al.	2602.19764	null
2026-02-23	RAID: Retrieval-Augmented Anomaly Detection	Mingxiu Cai et.al.	2602.19611	null
2026-02-23	Conversational AI for Automated Patient Questionnaire Completion: Development Insights and Design Principles	David Fraile Navarro et.al.	2602.19507	null
2026-02-23	EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting	Angzi Xu et.al.	2602.19485	null
2026-02-22	Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts	Toshihide Ubukata et.al.	2602.19244	null
2026-02-22	SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation	Yujie Lu et.al.	2602.19213	null
2026-02-22	JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation	Kai Liu et.al.	2602.19163	null
2026-02-22	Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection	Hossein Shokouhinejad et.al.	2602.19025	null
2026-02-21	Give Users the Wheel: Towards Promptable Recommendation Paradigm	Fuyuan Lyu et.al.	2602.18929	null
2026-02-20	Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory	Vatsal Agarwal et.al.	2602.18434	null
2026-02-19	Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds	Ibne Farabi Shihab et.al.	2602.17798	null
2026-02-19	Phase-Aware Mixture of Experts for Agentic Reinforcement Learning	Shengtian Yang et.al.	2602.17038	null
2026-02-19	Arcee Trinity Large Technical Report	Varun Singh et.al.	2602.17004	null
2026-02-18	Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes	Srikumar Nayak et.al.	2602.16109	null
2026-02-17	MoE-Spec: Expert Budgeting for Efficient Speculative Decoding	Bradley McDanel et.al.	2602.16052	null
2026-02-17	ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns	Ziyu Zhao et.al.	2602.15521	null
2026-02-16	Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs	Ali Khalesi et.al.	2602.15091	null
2026-02-15	DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices	Songyuan Li et.al.	2602.14301	null
2026-02-15	MILD: Multi-Intent Learning and Disambiguation for Proactive Failure Prediction in Intent-based Networking	Md. Kamrul Hossain et.al.	2602.14283	null
2026-02-15	Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection	Pinqiao Wang et.al.	2602.14251	null
2026-02-15	Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization	Rizhen Hu et.al.	2602.14159	null
2026-02-15	LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts	Yang Liu et.al.	2602.14060	null
2026-02-15	Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models	Sajjad Kachuee et.al.	2602.14039	null
2026-02-15	Eureka-Audio: Triggering Audio Intelligence in Compact Language Models	Dan Zhang et.al.	2602.13954	null
2026-02-14	Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening	The Tien Mai et.al.	2602.13888	null
2026-02-13	Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning	Jon Irureta et.al.	2602.12708	null
2026-02-13	Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers	Anrui Chen et.al.	2602.12587	null
2026-02-13	SD-MoE: Spectral Decomposition for Effective Expert Specialization	Ruijun Huang et.al.	2602.12556	null
2026-02-13	Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR	Jaeyoung Lee et.al.	2602.12546	null
2026-02-12	Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration	Akhiad Bercovich et.al.	2602.11937	null
2026-02-12	LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training	Xinyi Liu et.al.	2602.11686	null
2026-02-12	Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts	Haiyang Jiang et.al.	2602.11622	null
2026-02-12	Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm	Jinrui Zhang et.al.	2602.11543	null
2026-02-11	Demonstration and performance of an online data selection algorithm for liquid argon time projection chambers using MicroBooNE	MicroBooNE collaboration et.al.	2602.11138	null
2026-02-11	MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs	Yupu Gu et.al.	2602.10965	null
2026-02-11	CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control	Riccardo Barbano et.al.	2602.10933	null
2026-02-11	VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training	Guobin Shen et.al.	2602.10693	null
2026-02-11	Multimodal Priors-Augmented Text-Driven 3D Human-Object Interaction Generation	Yin Wang et.al.	2602.10659	null
2026-02-11	Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters	Ailin Huang et.al.	2602.10604	null
2026-02-11	Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity	Guangzhi Xiong et.al.	2602.10585	null
2026-02-10	Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching	Hanyuan Gao et.al.	2602.10254	null
2026-02-10	Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning	Ruopeng Cui et.al.	2602.09767	null
2026-02-10	DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment	Bohan Fu et.al.	2602.09531	null
2026-02-10	SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity	Yukun Zhang et.al.	2602.09386	null
2026-02-10	Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density	Zhendong Mi et.al.	2602.09316	null
2026-02-09	Generalizing GNNs with Tokenized Mixture of Experts	Xiaoguang Guo et.al.	2602.09258	null
2026-02-09	UI-Venus-1.5 Technical Report	Veuns-Team et.al.	2602.09082	null
2026-02-09	DirMoE: Dirichlet-routed Mixture of Experts	Amirhossein Vahidi et.al.	2602.09001	null
2026-02-09	OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation	Yehua Huang et.al.	2602.08896	null
2026-02-09	FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models	Annemette Brok Pirchert et.al.	2602.08818	null
2026-02-10	MOVA: Towards Scalable and Synchronized Video-Audio Generation	SII-OpenMOSS Team et.al.	2602.08794	null
2026-02-09	Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views	Duc-Anh Nguyen et.al.	2602.08755	null
2026-02-09	Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing	Jona te Lintelo et.al.	2602.08741	null
2026-02-09	6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks	Mohamed Amine Ferrag et.al.	2602.08675	null
2026-02-09	Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models	Mingzi Cao et.al.	2602.08658	null
2026-02-09	Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs	Yukun Jiang et.al.	2602.08621	null
2026-02-09	TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration	Linye Wei et.al.	2602.08404	null
2026-02-06	Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing	Meng Lou et.al.	2602.06862	null
2026-02-06	POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models	Yi Chen et.al.	2602.06822	null
2026-02-06	HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction	Shengxuan Qiu et.al.	2602.06527	null
2026-02-05	To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training	Meghana Madhyastha et.al.	2602.06183	null
2026-02-05	MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models	Nurbek Tastan et.al.	2602.06154	null
2026-02-05	OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale	Jingze Shi et.al.	2602.05711	null
2026-02-04	Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection	Bharadwaj Dogga et.al.	2602.05100	null
2026-02-04	Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism	Chenwei Cui et.al.	2602.04870	null
2026-02-04	ERNIE 5.0 Technical Report	Haifeng Wang et.al.	2602.04705	null
2026-02-04	Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting	Zhen Zhou et.al.	2602.04678	null
2026-02-04	RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models	Jiacheng Liang et.al.	2602.04448	null
2026-02-04	Mixture of Masters: Sparse Chess Language Models with Player Routing	Giacomo Frisoni et.al.	2602.04447	null
2026-02-04	Expert Selections In MoE Models Reveal (Almost) As Much As Text	Amir Nuriyev et.al.	2602.04105	null
2026-02-03	SpecMD: A Comprehensive Study On Speculative Expert Prefetching	Duc Hoang et.al.	2602.03921	null
2026-02-03	DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs	Zeyu Zhu et.al.	2602.03495	null
2026-02-03	Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts	Meng Lou et.al.	2602.03473	null
2026-02-03	VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers	Zhiwen Li et.al.	2602.03210	null
2026-02-03	Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry	Ye Su et.al.	2602.03204	null
2026-02-02	SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning	Qifan Yu et.al.	2602.02472	null
2026-02-02	Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models	Noam Steinmetz Yalon et.al.	2602.02467	null
2026-02-02	From Directions to Regions: Decomposing Activations in Language Models via Local Geometry	Or Shafran et.al.	2602.02464	null
2026-02-02	DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild	Arnab Das et.al.	2602.02286	null
2026-02-02	MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology	Susu Hu et.al.	2602.02282	null
2026-02-02	Edge-Aligned Initialization of Kernels for Steered Mixture-of-Experts	Martin Determann et.al.	2602.02031	null
2026-02-02	SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning	Zhen-Hao Xie et.al.	2602.01990	null
2026-02-02	Mixture-of-Experts with Intermediate CTC Supervision for Accented Speech Recognition	Wonjun Lee et.al.	2602.01967	null
2026-02-02	SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures	Liangtao Lin et.al.	2602.01858	null
2026-02-02	Mutual-Guided Expert Collaboration for Cross-Subject EEG Classification	Zhi Zhang et.al.	2602.01728	null
2026-01-31	Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity	Fanqi Yan et.al.	2602.00939	null
2026-01-31	Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs	Hao Mark Chen et.al.	2602.00879	null
2026-01-31	Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion	Tianyang Wu et.al.	2602.00678	null
2026-01-31	SEER: Transformer-based Robust Time Series Forecasting via Automated Patch Enhancement and Replacement	Xiangfei Qiu et.al.	2602.00589	null
2026-01-31	PROBE: Co-Balancing Computation and Communication in MoE Inference via Real-Time Predictive Prefetching	Qianchao Zhu et.al.	2602.00509	null
2026-01-30	UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling	Pingping Liu et.al.	2601.22746	null
2026-01-30	A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization	Shiye Lei et.al.	2601.22718	null
2026-01-30	A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation	Haonan He et.al.	2601.22708	null
2026-01-30	Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments	Jinwoo Jang et.al.	2601.22647	null
2026-01-30	SpanNorm: Reconciling Training Stability and Performance in Deep Transformers	Chao Wang et.al.	2601.22580	null
2026-01-30	Continual Policy Distillation from Distributed Reinforcement Learning Teachers	Yuxuan Li et.al.	2601.22475	null
2026-01-29	ECO: Quantized Training without Full-Precision Master Weights	Mahdi Nikdan et.al.	2601.22101	null
2026-01-29	MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts	Lorenzo Mazza et.al.	2601.21971	null
2026-01-29	MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts	Evandro S. Ortigossa et.al.	2601.21866	null
2026-01-29	Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers	Evandro S. Ortigossa et.al.	2601.21641	null
2026-01-29	Multi-Modal Time Series Prediction via Mixture of Modulated Experts	Lige Zhang et.al.	2601.21547	null
2026-01-29	ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory	Yang Zhao et.al.	2601.21545	null
2026-01-29	L $^3$ : Large Lookup Layers	Albert Tseng et.al.	2601.21461	null
2026-01-29	L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts	Minghao Yang et.al.	2601.21349	null
2026-01-29	Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies	Ce Hao et.al.	2601.21251	null
2026-01-29	Scaling Embeddings Outperforms Scaling Experts in Language Models	Hong Liu et.al.	2601.21204	null
2026-01-29	ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling	Yuchen Yang et.al.	2601.21198	null
2026-01-29	BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding	Ziyi Zhao et.al.	2601.21148	null
2026-01-29	TRACE: Trajectory Recovery for Continuous Mechanism Evolution in Causal Representation Learning	Shicheng Fan et.al.	2601.21135	null
2026-01-28	ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler	Bohua Zou et.al.	2601.20755	null
2026-01-28	Unsupervised Ensemble Learning Through Deep Energy-based Models	Ariel Maymon et.al.	2601.20556	null
2026-01-28	OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution	Le Zhang et.al.	2601.20380	null
2026-01-28	OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion	Shuoyan Wei et.al.	2601.20308	null
2026-01-28	MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting	Jing Xu et.al.	2601.20300	null
2026-01-28	HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH	Yueyang Wang et.al.	2601.20255	null
2026-01-28	Control Models for In-IDE Code Completion	Aral de Moor et.al.	2601.20223	null
2026-01-28	Hyperparameter Transfer with Mixture-of-Expert Layers	Tianze Jiang et.al.	2601.20205	null
2026-01-27	Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts	TrungKhang Tran et.al.	2601.19811	null
2026-01-27	Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes	Yifan Wang et.al.	2601.19723	null
2026-01-27	Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition	Isha Pandey et.al.	2601.19451	null
2026-01-26	Fauna Sprout: A lightweight, approachable, developer-ready humanoid robot	Fauna Robotics et.al.	2601.18963	null
2026-01-26	OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion	Zhichao Wang et.al.	2601.18094	null
2026-01-26	LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts	Venmugil Elango et.al.	2601.18089	null
2026-01-25	Domain-Expert-Guided Hybrid Mixture-of-Experts for Medical AI: Integrating Data-Driven Learning with Clinical Priors	Jinchen Gu et.al.	2601.17977	null
2026-01-25	$\infty$ -MoE: Generalizing Mixture of Experts to Infinite Experts	Shota Takashiro et.al.	2601.17680	null
2026-01-24	PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes	Xinru Cui et.al.	2601.17440	null
2026-01-23	Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts	Xuan-Phi Nguyen et.al.	2601.17111	null
2026-01-22	FlashMoE: Reducing SSD I/O Bottlenecks via ML-Based Cache Replacement for Mixture-of-Experts Inference on Edge Devices	Byeongju Kim et.al.	2601.17063	null
2026-01-23	GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints	Andy Zhu et.al.	2601.16905	null
2026-01-23	Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation	Tims Pecerskis et.al.	2601.16863	null
2026-01-23	LongCat-Flash-Thinking-2601 Technical Report	Meituan LongCat Team et.al.	2601.16725	null
2026-01-22	LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting	Yuhan Chen et.al.	2601.15772	null
2026-01-21	Improving MoE Compute Efficiency by Composing Weight and Data Sparsity	Maciej Kilian et.al.	2601.15370	null
2026-01-21	Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization	Adam Rokah et.al.	2601.15021	null
2026-01-21	Modeling the Thermal Behavior of Photopolymers for In-Space Fabrication	Jonathan Ericson et.al.	2601.14897	null
2026-01-21	UniRoute: Unified Routing Mixture-of-Experts for Modality-Adaptive Remote Sensing Change Detection	Qingling Shu et.al.	2601.14797	null
2026-01-21	Robustness of Mixtures of Experts to Feature Noise	Dong Sun et.al.	2601.14792	null
2026-01-20	Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models	YuanLab. ai et.al.	2601.14327	null
2026-01-20	Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering	Yuxin Chen et.al.	2601.14050	null
2026-01-20	DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging	Adrien Meyer et.al.	2601.13954	null
2026-01-20	The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) II. The radial structure of debris discs	Yinuo Han et.al.	2601.13670	null
2026-01-20	MN-TSG:Continuous Time Series Generation with Irregular Observations	Xu Zhang et.al.	2601.13534	null
2026-01-19	CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks	Mingshuang Luo et.al.	2601.13133	null
2026-01-19	Polychronous Wave Computing: Timing-Native Address Selection in Spiking Networks	Natalila G. Berloff et.al.	2601.13079	null
2026-01-19	PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning	Zhiyan Hou et.al.	2601.13020	null
2026-01-19	HT-GNN: Hyper-Temporal Graph Neural Network for Customer Lifetime Value Prediction in Baidu Ads	Xiaohui Zhao et.al.	2601.13013	null
2026-01-19	OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models	Shiyuan Li et.al.	2601.12996	null
2026-01-19	PhyG-MoE: A Physics-Guided Mixture-of-Experts Framework for Energy-Efficient GNSS Interference Recognition	Zhihan Zeng et.al.	2601.12798	null
2026-01-18	The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) V: Comparison between scattered light and thermal emission	J. Milli et.al.	2601.12586	null
2026-01-18	A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding	Hoang Viet Nguyen et.al.	2601.12483	null
2026-01-18	Learning Diverse Skills for Behavior Models with Mixture of Experts	Wangtian Shen et.al.	2601.12397	null
2026-01-18	NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages	Lakshya Tomar et.al.	2601.12389	null
2026-01-18	GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer	Xinyuan Zhao et.al.	2601.12316	null
2026-01-18	Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation	Mingrui Liu et.al.	2601.12301	null
2026-01-17	EMoE: Eigenbasis-Guided Routing for Mixture-of-Experts	Anzhe Cheng et.al.	2601.12137	null
2026-01-17	The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) III: The vertical structure of debris disks	Brianna Zawadzki et.al.	2601.12128	null
2026-01-17	One-Shot Price Forecasting with Covariate-Guided Experts under Privacy Constraints	Ren He et.al.	2601.11977	null
2026-01-16	The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) VII: Optically thick gas with broad CO gaussian local line profiles in the HD 121617 disc	A. Brennan et.al.	2601.11824	null
2026-01-16	Self-Augmented Mixture-of-Experts for QoS Prediction	Kecheng Cai et.al.	2601.11036	null
2026-01-16	RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions	Tasneem Shaffee et.al.	2601.10921	null
2026-01-15	MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts	Yuxuan Lou et.al.	2601.10272	null
2026-01-15	MMPG: MoE-based Adaptive Multi-Perspective Graph Fusion for Protein Representation Learning	Yusong Wang et.al.	2601.10157	null
2026-01-14	Progressive Mixture-of-Experts with autoencoder routing for continual RANS turbulence modelling	Haoyu Ji et.al.	2601.09305	null
2026-01-15	A.X K1 Technical Report	Sung Jun Cheon et.al.	2601.09200	null
2026-01-14	WiFo-E: A Scalable Wireless Foundation Model for End-to-End FDD Precoding in Communication Networks	Weibo Wen et.al.	2601.09186	null
2026-01-14	Horseshoe Mixtures-of-Experts (HS-MoE)	Nick Polson et.al.	2601.09043	null
2026-01-13	LookAhead: The Optimal Non-decreasing Index Policy for a Time-Varying Holding Cost problem	Keerthana Gurushankar et.al.	2601.08960	null
2026-01-13	MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm	Bowen Zhou et.al.	2601.08800	null
2026-01-13	LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms	Namhyun Kim et.al.	2601.08780	null
2026-01-13	M $^2$ FMoE: Multi-Resolution Multi-View Frequency Mixture-of-Experts for Extreme-Adaptive Time Series Forecasting	Yaohui Huang et.al.	2601.08631	null
2026-01-13	Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance	Jihang Li et.al.	2601.08418	null
2026-01-13	Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models	Bo Wang et.al.	2601.08383	null
2026-01-13	Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints	Seng Pei Liew et.al.	2601.08215	null
2026-01-12	Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation	Yuxin Yang et.al.	2601.07935	null
2026-01-12	Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator	Chaewon Heo et.al.	2601.07698	null
2026-01-12	Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models	Xin Cheng et.al.	2601.07372	null
2026-01-11	Solar Open Technical Report	Sungrae Park et.al.	2601.07022	null
2026-01-11	Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems	Qikai Xiao et.al.	2601.06858	null
2026-01-11	MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models	Xin Ye et.al.	2601.06857	null
2026-01-11	MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation	Bochao Sun et.al.	2601.06829	null
2026-01-11	SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute	Bowen Shen et.al.	2601.06790	null
2026-01-10	Hellinger Multimodal Variational Autoencoders	Huyen Khanh Vo et.al.	2601.06572	null
2026-01-10	Physics-guided foundation model for universal speckle removal in ultrathin multimode fiber imaging	Xianrui Zeng et.al.	2601.06448	null
2026-01-09	Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning	Nusrat Jahan Prottasha et.al.	2601.06356	null
2026-01-09	Reconstruction of atmospheric neutrinos in DUNE's horizontal-drift far-detector module	DUNE Collaboration et.al.	2601.05697	null
2026-01-09	Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts	Wei Zhou et.al.	2601.05537	null
2026-01-08	MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs	Jiyuan Zhang et.al.	2601.05296	null
2026-01-08	MoE3D: A Mixture-of-Experts Module for 3D Reconstruction	Zichen Wang et.al.	2601.05208	null
2026-01-08	FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts	Yiji Zhao et.al.	2601.05174	null
2026-01-08	How to Set the Learning Rate for Large-Scale Pre-training?	Yunhua Zhou et.al.	2601.05049	null
2026-01-08	DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation	Guanzhi Deng et.al.	2601.04823	null
2026-01-07	A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems	Qi Wu et.al.	2601.03992	null
2026-01-07	Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures	Ibrahim Delibasoglu et.al.	2601.03889	null
2026-01-07	Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts	Ye Su et.al.	2601.03577	null
2026-01-07	CALM: Culturally Self-Aware Language Models	Lingzhi Shen et.al.	2601.03483	null
2026-01-06	The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models	Yan Wang et.al.	2601.03425	null
2026-01-06	ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios	Yihan Wei et.al.	2601.03011	null
2026-01-06	MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free	Yishu Lei et.al.	2601.02967	null
2026-01-06	MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation	Wenzhao Jiang et.al.	2601.02943	null
2026-01-06	MiMo-V2-Flash Technical Report	Bangjun Xiao et.al.	2601.02780	null
2026-01-05	Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts	Boxuan Lyu et.al.	2601.02144	null
2026-01-05	GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection	Joongwon Chae et.al.	2601.01856	null
2026-01-05	K-EXAONE Technical Report	Eunbi Choi et.al.	2601.01739	null
2026-01-05	Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications	YuanLab. ai et.al.	2601.01718	null
2026-01-05	Varying-Coefficient Mixture of Experts Model	Qicheng Zhao et.al.	2601.01699	null
2026-01-04	Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts	Ruofeng Yang et.al.	2601.01475	null
2026-01-04	Making MoE based LLM inference resilient with Tarragon	Songyu Zhang et.al.	2601.01310	null
2026-01-03	MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance	Hamad Khan et.al.	2601.01260	null
2026-01-02	Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures	Kabir Grover et.al.	2601.00942	null
2026-01-02	HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts	Zihan Fang et.al.	2601.00583	null
2026-01-01	Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations	Hyunjun Kim et.al.	2601.00457	null
2026-01-01	Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach	Kohei Yoshikawa et.al.	2601.00287	null
2025-12-31	Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models	Ákos Prucs et.al.	2512.24776	null
2026-01-01	Sufficient and Necessary Conditions for Eckart-Young like Result for Tubal Tensors	Uria Mor et.al.	2512.24405	null
2025-12-30	Quantum Computing, Ising Formulation, and the Traveling Salesman Problem	Omer Gurevich et.al.	2512.24308	null
2025-12-30	Training Report of TeleChat3-MoE	Xinzhang Liu et.al.	2512.24157	null
2025-12-30	RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress	Ruixuan Huang et.al.	2512.23995	null
2025-12-30	Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation	Hualin Ye et.al.	2512.23938	null
2025-12-29	Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion	Vladimer Khasia et.al.	2512.23448	null
2025-12-29	Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss	Ang Lv et.al.	2512.23447	null
2025-12-30	YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection	Xu Lin et.al.	2512.23273	null
2025-12-28	Trust Region Masking for Long-Horizon LLM Reinforcement Learning	Yingru Li et.al.	2512.23075	null
2025-12-28	FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment	Boyang Zhang et.al.	2512.23070	null
2025-12-28	Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware	Alex Khalil et.al.	2512.23029	null
2025-12-28	Text-Routed Sparse Mixture-of-Experts Model with Explanation and Temporal Alignment for Multi-Modal Sentiment Analysis	Dongning Rao et.al.	2512.22741	null
2025-12-27	Bright 4B: Scaling Hyperspherical Learning for Segmentation in 3D Brightfield Microscopy	Amil Khan et.al.	2512.22423	null
2025-12-26	FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion	Zhuoran Zhu et.al.	2512.22036	null
2025-12-26	SWE-RM: Execution-free Feedback For Software Engineering Agents	KaShun Shum et.al.	2512.21919	null
2025-12-26	Accelerate Speculative Decoding with Sparse Computation in Verification	Jikai Wang et.al.	2512.21911	null
2025-12-26	MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction	Carolina Aparício et.al.	2512.21897	null
2025-12-25	Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction	Zheng Yin et.al.	2512.21707	null
2025-12-25	Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism	Xinglin Pan et.al.	2512.21487	null
2025-12-24	DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction	Khondoker Mirazul Mumenin et.al.	2512.21433	null
2025-12-25	GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs	Lichao Wu et.al.	2512.21008	null
2025-12-24	RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks	Ningyuan Liu et.al.	2512.20920	null
2025-12-24	NVIDIA Nemotron 3: Efficient and Open Intelligence	NVIDIA et.al.	2512.20856	null
2025-12-23	Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning	NVIDIA et.al.	2512.20848	null
2025-12-23	Defending against adversarial attacks using mixture of experts	Mohammad Meymani et.al.	2512.20821	null
2025-12-23	MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts	Alexandros Christoforos et.al.	2512.20604	null
2025-12-23	Branch Learning in MRI: More Data, More Models, More Training	Yuyang Li et.al.	2512.20330	null
2025-12-23	Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity	Yuxing Gan et.al.	2512.20291	null
2025-12-23	Degradation-Aware Metric Prompting for Hyperspectral Image Restoration	Binfeng Wang et.al.	2512.20251	null
2025-12-23	AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model	Sofian Chaybouti et.al.	2512.20157	null
2025-12-22	UCCL-EP: Portable Expert-Parallel Communication	Ziming Mao et.al.	2512.19849	null
2025-12-22	Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios	Jiawen Wang et.al.	2512.19551	null
2025-12-22	EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control	Chao Yang et.al.	2512.19043	null
2025-12-21	Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation	Guangtao Lyu et.al.	2512.18804	null
2025-12-21	Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts	Linwei Qiu et.al.	2512.18718	null
2025-12-21	Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing	Wentao Liu et.al.	2512.18674	null
2025-12-20	Secret mixtures of experts inside your LLM	Enric Boix-Adsera et.al.	2512.18452	null
2025-12-20	MoE Pathfinder: Trajectory-driven Expert Pruning	Xican Yang et.al.	2512.18425	null
2025-12-20	MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation	Kaixing Yang et.al.	2512.18181	null
2025-12-19	MoE-TransMov: A Transformer-based Model for Next POI Prediction in Familiar & Unfamiliar Movements	Ruichen Tan et.al.	2512.17985	null
2025-12-22	SCOPE: Sequential Causal Optimization of Process Interventions	Jakob De Moor et.al.	2512.17629	null
2025-12-18	Bandwidth-Efficient Adaptive Mixture-of-Experts via Low-Rank Compensation	Zhenyu Liu et.al.	2512.17073	null
2025-12-18	Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models	Zhongpan Tang et.al.	2512.16963	null
2025-12-18	An Upper Bound on the M/M/k Queue With Deterministic Setup Times	Jalani Williams et.al.	2512.16854	null
2025-12-18	Meta-RL Induces Exploration in Language Agents	Yulun Jiang et.al.	2512.16848	null
2025-12-18	PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation	Mengyuan Liu et.al.	2512.16494	null
2025-12-18	Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems	En-Ming Huang et.al.	2512.16473	null
2025-12-18	Pretrained Battery Transformer (PBT): A battery life prediction foundation model	Ruifeng Tan et.al.	2512.16334	null
2025-12-19	Sigma-MoE-Tiny Technical Report	Qingguo Hu et.al.	2512.16248	null
2025-12-18	INTELLECT-3: Technical Report	Prime Intellect Team et.al.	2512.16144	null
2025-12-18	Let the Barbarians In: How AI Can Accelerate Systems Performance Research	Audrey Cheng et.al.	2512.14806	null
2025-12-15	SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning	Tomohito Kawabata et.al.	2512.14757	null
2025-12-16	SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing	Han Zou et.al.	2512.14140	null
2025-12-16	SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations	Wentao Guo et.al.	2512.14080	null
2025-12-16	Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training	Can Jin et.al.	2512.13996	null
2025-12-13	RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing	Yuhan Tang et.al.	2512.13727	null
2025-12-15	StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion	Guransh Singh et.al.	2512.13632	null
2025-12-16	Janus: Disaggregating Attention and Experts for Scalable MoE Inference	Zhexiang Zhang et.al.	2512.13525	null
2025-12-15	Automated Information Flow Selection for Multi-scenario Multi-task Recommendation	Chaohua Yang et.al.	2512.13396	null
2025-12-13	Fine-Grained Zero-Shot Learning with Attribute-Centric Representations	Zhi Chen et.al.	2512.12219	null
2025-12-13	MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models	Ahmad Chamma et.al.	2512.12121	null
2025-12-11	Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning	Benjamin Gundersen et.al.	2512.10691	null
2025-12-11	Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration	Wenlong Jiao et.al.	2512.10581	null
2025-12-11	Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment	Han Li et.al.	2512.10450	null
2025-12-10	Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach	Salvador Carrión et.al.	2512.09910	null
2025-12-10	DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation	Zhizhong Wang et.al.	2512.09814	null
2025-12-10	M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks	Blessed Guda et.al.	2512.09797	null
2025-12-10	FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model	Xiang Chen et.al.	2512.09282	null
2025-12-10	Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens	Yanpeng Yu et.al.	2512.09277	null
2025-12-09	Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts	Yifan Lyu et.al.	2512.08814	null
2025-12-09	What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance	Athena Psalta et.al.	2512.08697	null
2025-12-09	Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems	Mingwei Li et.al.	2512.08411	null
2025-12-08	LongCat-Image Technical Report	Meituan LongCat Team et.al.	2512.07584	null
2025-12-08	Search for Light Sterile Neutrinos With Two Neutrino Beams at MicroBooNE	MicroBooNE collaboration et.al.	2512.07159	null
2025-12-09	TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning	Zebin Xing et.al.	2512.07135	null
2025-12-08	PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes	Kepeng Lin et.al.	2512.07113	null
2025-12-07	Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding	MinCheol Jeon et.al.	2512.06929	null
2025-12-07	Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks	Long Shi et.al.	2512.06784	null
2025-12-07	Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving	Wei-Bin Kou et.al.	2512.06664	null
2025-12-06	Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion	Jaewon Ahn et.al.	2512.06449	null
2025-12-04	The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation	Ranjan Sapkota et.al.	2512.06032	null
2025-12-05	HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies	Zhiying Du et.al.	2512.05693	null
2025-12-05	ProPhy: Progressive Physical Alignment for Dynamic World Simulation	Zijun Wang et.al.	2512.05564	null
2025-12-05	EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture	Xin He et.al.	2512.04810	null
2025-12-04	Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space	Joey Hong et.al.	2512.04601	null
2025-12-04	Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems	Zehao Fan et.al.	2512.04476	null
2025-12-03	Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research	Zia Qi et.al.	2512.04261	null
2025-12-03	OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference	Liujianfu Wang et.al.	2512.03927	null
2025-12-04	A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models	X. Y. Han et.al.	2512.03915	null
2025-12-03	Parsimonious Clustering of Covariance Matrices	Yixi Xu et.al.	2512.03912	null
2025-12-03	CellScout: Visual Analytics for Mining Biomarkers in Cell State Discovery	Rui Sheng et.al.	2512.03485	null
2025-12-03	SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models	Geoffrey J. McLachlan et.al.	2512.03322	null
2025-12-02	SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts	Jiaqi Liu et.al.	2512.02517	null
2025-12-02	Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention	Wenyi Xiong et.al.	2512.02368	null
2025-12-02	Understanding and Harnessing Sparsity in Unified Multimodal Models	Shwai He et.al.	2512.02351	null
2025-12-01	Towards Unified Video Quality Assessment	Chen Feng et.al.	2512.02224	null
2025-12-01	ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation	Chenyang Gu et.al.	2512.02013	null
2025-12-01	Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks	Kai Zhang et.al.	2512.01750	null
2025-12-01	GRASP: Guided Residual Adapters with Sample-wise Partitioning	Felix Nützel et.al.	2512.01675	null
2025-12-01	Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery	Zhicheng Zhao et.al.	2512.01665	null
2025-12-01	Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios	Yiqiao Chen et.al.	2512.01653	null
2025-12-02	Stabilizing Reinforcement Learning with LLMs: Formulation and Practices	Chujie Zheng et.al.	2512.01374	null
2025-12-01	Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe	Yahui Liu et.al.	2512.01252	null
2025-11-30	Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning	Yebo Wu et.al.	2512.00902	null
2025-11-30	Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking	Lingling Fu et.al.	2512.00724	null
2025-11-29	GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding	Yiqiao Chen et.al.	2512.00574	null
2025-11-28	Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model	Junshu Tang et.al.	2511.23429	null
2025-11-28	LFM2 Technical Report	Alexander Amini et.al.	2511.23404	null
2025-11-28	Chart2Code-MoLA: Efficient Multi-Modal Code Generation via Adaptive Expert Routing	Yifei Wang et.al.	2511.23321	null
2025-11-28	Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering	Zijian Fu et.al.	2511.23304	null
2025-11-28	Experts are all you need: A Composable Framework for Large Language Model Inference	Shrihari Sridharan et.al.	2511.22955	null
2025-11-28	EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model	Yuhao Xu et.al.	2511.22935	null
2025-11-27	OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency	Jun Wang et.al.	2511.22481	null
2025-11-27	Foundation Model for Intelligent Wireless Communications	Boxun Liu et.al.	2511.22222	null
2025-11-27	MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding	Yu Li et.al.	2511.22103	null
2025-11-27	Qwen3-VL Technical Report	Shuai Bai et.al.	2511.21631	null
2025-11-26	MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training	Lu Zhao et.al.	2511.21431	null
2025-11-26	MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts	Ivan Novikov et.al.	2511.21089	null
2025-11-25	HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation	Xiang Wang et.al.	2511.20520	null
2025-11-25	MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology	Kiril Vasilev et.al.	2511.20490	null
2025-11-25	Soft Adaptive Policy Optimization	Chang Gao et.al.	2511.20347	null
2025-11-25	ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories	Hai Ling et.al.	2511.20169	null
2025-11-25	Adaptive Knowledge Transfer for Cross-Disciplinary Cold-Start Knowledge Tracing	Yulong Deng et.al.	2511.20009	null
2025-11-25	Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models	Wentao Hu et.al.	2511.19822	null
2025-11-22	Exploiting the Experts: Unauthorized Compression in MoE-LLMs	Pinaki Prasad Guha Neogi et.al.	2511.19480	null
2025-11-24	OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs	Yuting Gao et.al.	2511.19023	null
2025-11-24	Dynamic Mixture of Experts Against Severe Distribution Shifts	Donghu Kim et.al.	2511.18987	null
2025-11-23	HiFi-MambaV2: Hierarchical Shared-Routed MoE for High-Fidelity MRI Reconstruction	Pengcheng Fang et.al.	2511.18534	null
2025-11-23	Attosecond-resolved quantum fluctuations of light and matter	Matan Even Tzur et.al.	2511.18362	null
2025-11-23	AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert	Yuting Gao et.al.	2511.18314	null
2025-11-22	PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures	Yuheng Shao et.al.	2511.18116	null
2025-11-22	CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking	Hao Li et.al.	2511.17967	null
2025-11-22	FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning	Guoyang Xia et.al.	2511.17885	null
2025-11-22	Equivalence of Context and Parameter Updates in Modern Transformer Blocks	Adrian Goldwaser et.al.	2511.17864	null
2025-11-21	Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization	Akhil Singampalli et.al.	2511.17829	null
2025-11-21	Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?	Sukwon Yun et.al.	2511.17400	null
2025-11-21	MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment	Huangbiao Xu et.al.	2511.17397	null
2025-11-21	Measurements of differential charged-current cross sections on argon for electron neutrinos with final-state protons in MicroBooNE	MicroBooNE collaboration et.al.	2511.17342	null
2025-11-21	Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design	Quentin Anthony et.al.	2511.17127	null
2025-11-21	VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions	Qianyi Shao et.al.	2511.16998	null
2025-11-21	RadioKMoE: Knowledge-Guided Radiomap Estimation with Kolmogorov-Arnold Networks and Mixture-of-Experts	Fupei Guo et.al.	2511.16986	null
2025-11-21	MicroMoE: Fine-Grained Load Balancing for Mixture-of-Experts with Token Scheduling	Chenqi Zhao et.al.	2511.16947	null
2025-11-20	Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution	Xiao He et.al.	2511.16024	null
2025-11-19	AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture	Qiming Guo et.al.	2511.15870	null
2025-11-19	MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping	Yushi Huang et.al.	2511.15690	null
2025-11-19	VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation	Tairan He et.al.	2511.15200	null
2025-11-19	GPU-Initiated Networking for NCCL	Khaled Hamidouche et.al.	2511.15076	null
2025-11-19	WiCo-PG: Wireless Channel Foundation Model for Pathloss Map Generation via Synesthesia of Machines	Mingran Sun et.al.	2511.15030	null
2025-11-19	WiCo-MG: Wireless Channel Foundation Model for Multipath Generation via Synesthesia of Machines	Zengrui Han et.al.	2511.15026	null
2025-11-19	Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference	Kexin Chu et.al.	2511.15015	null
2025-11-18	HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation	Lai Wei et.al.	2511.14756	null
2025-11-18	Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching	Jintao Zhang et.al.	2511.14488	null
2025-11-18	MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts	Wenfeng Wang et.al.	2511.14102	null
2025-11-18	FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration	Jingren Liu et.al.	2511.14099	null
2025-11-18	SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts	Fan Zhang et.al.	2511.14093	null
2025-11-17	MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis	Peng Shu et.al.	2511.13983	null
2025-11-17	Introducing AI to an Online Petition Platform Changed Outputs but not Outcomes	Isabel Corpus et.al.	2511.13949	null
2025-11-17	InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE	Lipeng Wang et.al.	2511.13488	null
2025-11-17	Measurement of Exclusive $π^+$ --argon Interactions Using ProtoDUNE-SP	DUNE Collaboration et.al.	2511.13462	null
2025-11-18	YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection	Ori Meiraz et.al.	2511.13344	null
2025-11-17	Self-Adaptive Graph Mixture of Models	Mohit Meena et.al.	2511.13062	null
2025-11-17	Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation	Yu Hou et.al.	2511.12922	null
2025-11-16	Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data	Yunxin Li et.al.	2511.12609	null
2025-11-16	SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition	Qing Cai et.al.	2511.12559	null
2025-11-16	MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics	Jing Li et.al.	2511.12525	null
2025-11-16	MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding	Zhanheng Nie et.al.	2511.12449	null
2025-11-15	SAC-MoE: Reinforcement Learning with Mixture-of-Experts for Control of Hybrid Dynamical Systems with Uncertainty	Leroy D'Souza et.al.	2511.12361	null
2025-11-15	AMR-MoEGA: Antimicrobial Resistance Prediction using Mixture of Experts and Genetic Algorithms	Anshul Bagaria et.al.	2511.12223	null
2025-11-15	ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction	Ruochen Li et.al.	2511.12214	null
2025-11-14	First Measurement of $π^+$-Ar and $p$ -Ar Total Inelastic Cross Sections in the Sub-GeV Energy Regime with ProtoDUNE-SP Data	DUNE Collaboration et.al.	2511.11925	null
2025-11-14	FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models	Yonatan Dukler et.al.	2511.11505	null
2025-11-14	Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification	Qinghao Gao et.al.	2511.11460	null
2025-11-14	Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing	Cong Cao et.al.	2511.11236	null
2025-11-14	DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding	Mingwei Xing et.al.	2511.11232	null
2025-11-14	ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization	Anzhe Cheng et.al.	2511.10971	null
2025-11-14	Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go	Yashshi Pipalani et.al.	2511.10868	null
2025-11-13	Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts	Sumin Lee et.al.	2511.10300	null
2025-11-13	RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo	Jueun Ko et.al.	2511.10107	null
2025-11-13	BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference	Yun Wang et.al.	2511.10054	null
2025-11-13	ConSurv: Multimodal Continual Learning for Survival Analysis	Dianzhi Yu et.al.	2511.09853	null
2025-11-12	UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving	Ziyi Song et.al.	2511.09013	null
2025-11-12	Selective Sinkhorn Routing for Improved Sparse Mixture of Experts	Duc Anh Nguyen et.al.	2511.08972	null
2025-11-12	Bayesian Mixture of Experts For Large Language Models	Maryam Dialameh et.al.	2511.08968	null
2025-11-11	OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild	Yuncheng Guo et.al.	2511.08423	null
2025-11-11	Text-based Aerial-Ground Person Retrieval	Xinyu Zhou et.al.	2511.08369	null
2025-11-13	National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech -- The SpeechCARE Solution	Maryam Zolnoori et.al.	2511.08132	null
2025-11-10	Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs	Zhongyang Li et.al.	2511.07419	null
2025-11-10	AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning	Qile Jiang et.al.	2511.07262	null
2025-11-10	S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning	Jiangwen Dong et.al.	2511.06727	null
2025-11-10	Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation	Evelyn Chee et.al.	2511.06723	null
2025-11-09	Route Experts by Sequence, not by Token	Tiansheng Wen et.al.	2511.06494	null
2025-11-09	HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation	Kunrong Li et.al.	2511.06388	null
2025-11-09	A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images	Ardhendu Sekhar et.al.	2511.06266	null
2025-11-08	DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities	Nagur Shareef Shaik et.al.	2511.05968	null
2025-11-08	MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering	Jian Zhu et.al.	2511.05876	null
2025-11-08	In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading	Shuning Lin et.al.	2511.05814	null
2025-11-07	MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery	Baiye Cheng et.al.	2511.05007	null
2025-11-06	PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference	Yushu Zhao et.al.	2511.04805	null
2025-11-06	GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization	Mahmoud Soliman et.al.	2511.04008	null
2025-11-05	GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models	Zhibin Wang et.al.	2511.03251	null
2025-11-04	RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains	Tianle Pu et.al.	2511.02331	null
2025-11-04	FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error	Fengjuan Wang et.al.	2511.02302	null
2025-11-04	Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining	Costin-Andrei Oncescu et.al.	2511.02237	null
2025-11-03	Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing	Song Gao et.al.	2511.01743	null
2025-11-03	HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA	Lei Hu et.al.	2511.01463	null
2025-11-04	CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing	Yifan Zhou et.al.	2511.01197	null
2025-11-03	DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection	Guoxin Ma et.al.	2511.01192	null
2025-11-01	OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback	Kai Luo et.al.	2511.00510	null
2025-10-31	LongCat-Flash-Omni Technical Report	Meituan LongCat Team et.al.	2511.00279	null
2025-10-31	Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals	Xiangyu Fan et.al.	2510.27684	null
2025-10-31	RDMA Point-to-Point Communication for LLM Systems	Nandor Licker et.al.	2510.27656	null
2025-10-31	MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts	Jingnan Gao et.al.	2510.27234	null
2025-10-31	AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification	Yuanhao Tang et.al.	2510.27155	null
2025-10-30	Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement	Aaditya Shukla et.al.	2510.27051	null
2025-10-30	Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems	Hongbo Li et.al.	2510.27004	null
2025-10-30	MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation	Arghavan Rezvani et.al.	2510.26996	null
2025-10-30	ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference	Zixu Shen et.al.	2510.26730	null
2025-10-30	Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications	Chuang Zhang et.al.	2510.26628	null
2025-10-30	MossNet: Mixture of State-Space Experts is a Multi-Head Attention	Shikhar Tuli et.al.	2510.26182	null
2025-10-29	Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis	Hyeonjun Lee et.al.	2510.26014	null
2025-10-31	Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training	Hong Wang et.al.	2510.25803	null
2025-10-29	Revisiting scalable sequential recommendation with Multi-Embedding Approach and Mixture-of-Experts	Qiushi Pan et.al.	2510.25285	null
2025-10-29	MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference	Xinru Tang et.al.	2510.25258	null
2025-10-29	H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts	Peilin Tan et.al.	2510.25091	null
2025-10-28	Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation	Inclusion AI et.al.	2510.24821	null
2025-10-28	Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance	Yujie Wei et.al.	2510.24711	null
2025-10-28	Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation	Xiucheng Zhang et.al.	2510.24055	null
2025-10-26	Sparsity and Superposition in Mixture of Experts	Marmik Chaudhari et.al.	2510.23671	null
2025-10-27	EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting	Musleh Alharthi et.al.	2510.23396	null
2025-10-27	Rethinking GSPO: The Perplexity-Entropy Equivalence	Chi Liu et.al.	2510.23142	null
2025-10-27	Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts	Di Zhang et.al.	2510.23027	null
2025-10-27	MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning	Han Wu et.al.	2510.23013	null
2025-10-25	Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation	Ling-Team et.al.	2510.22115	null
2025-10-24	PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling	Andrea Bonfanti et.al.	2510.21262	null
2025-10-24	Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization	Yunlong Chu et.al.	2510.21207	null
2025-10-24	Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts	Yanguang Sun et.al.	2510.21114	null
2025-10-24	MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning	Siyong Chen et.al.	2510.21093	null
2025-10-23	Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts	Mariona Jaramillo-Civill et.al.	2510.20666	null
2025-10-23	xTime: Extreme Event Prediction with Hierarchical Knowledge Distillation and Expert Fusion	Quan Li et.al.	2510.20651	null
2025-10-23	Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning	Xiaohan Lan et.al.	2510.20519	null
2025-10-23	A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization	LinFeng Li et.al.	2510.20291	null
2025-10-23	AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training	Huawei Bai et.al.	2510.20111	null
2025-10-22	HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission	Weihao Yang et.al.	2510.19470	null
2025-10-22	MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs	Xinfeng Xia et.al.	2510.19366	null
2025-10-22	Modeling Turn-Taking with Semantically Informed Gestures	Varsha Suresh et.al.	2510.19350	null
2025-10-23	RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training	Heng Xu et.al.	2510.19262	null
2025-10-22	A Design Science Blueprint for an Orchestrated AI Assistant in Doctoral Supervision	Teo Susnjak et.al.	2510.19227	null
2025-10-22	MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting	In-Hwan Jin et.al.	2510.19210	null
2025-10-21	Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework	Yujie Xing et.al.	2510.18825	null
2025-10-21	Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification	Bin Gu et.al.	2510.18533	null
2025-10-21	Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study	Gangda Deng et.al.	2510.18370	null
2025-10-19	L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts	Shihao Ji et.al.	2510.17898	null
2025-10-20	Towards 3D Objectness Learning in an Open World	Taichi Liu et.al.	2510.17686	null
2025-10-20	Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model	Xinwei Zhang et.al.	2510.17684	null
2025-10-20	Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm	Hao Qiao et.al.	2510.17604	null
2025-10-20	ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts	Zheyue Tan et.al.	2510.17483	null
2025-10-19	End-to-end Listen, Look, Speak and Act	Siyin Wang et.al.	2510.16756	null
2025-10-18	NeurIPT: Foundation Model for Neural Interfaces	Zitao Fang et.al.	2510.16548	null
2025-10-18	Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts	Yongxiang Hua et.al.	2510.16448	null
2025-10-18	Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures	Minh-Khoi Nguyen-Nhat et.al.	2510.16411	null
2025-10-17	Expert Merging in Sparse Mixture of Experts with Nash Bargaining	Dung V. Nguyen et.al.	2510.16138	null
2025-10-17	Mixture of Experts Approaches in Dense Retrieval Tasks	Effrosyni Sokli et.al.	2510.15683	null
2025-10-17	FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification	Zhen Sun et.al.	2510.15595	null
2025-10-17	Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks	Yuyuan Feng et.al.	2510.15333	null
2025-10-17	MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation	Xianyang Qi et.al.	2510.15286	null
2025-10-17	Adaptive Individual Uncertainty under Out-Of-Distribution Shift with Expert-Routed Conformal Prediction	Amitesh Badkul et.al.	2510.15233	null
2025-10-16	Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models	Guinan Su et.al.	2510.14853	null
2025-10-16	MergeMoE: Efficient Compression of MoE Models via Expert Output Merging	Ruijie Miao et.al.	2510.14436	null
2025-10-16	Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning	Weijie Shen et.al.	2510.14300	null
2025-10-16	MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering	Mingkai Liu et.al.	2510.14251	null
2025-10-15	REAP the Experts: Why Pruning Prevails for One-Shot MoE compression	Mike Lasby et.al.	2510.13999	null
2025-10-15	Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module	Ruitao Feng et.al.	2510.13558	null
2025-10-15	ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition	Deeptimaan Banerjee et.al.	2510.13493	null
2025-10-15	Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers	Xin Zhao et.al.	2510.13462	null
2025-10-15	Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts	Li Bai et.al.	2510.13451	null
2025-10-15	UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE	Zhenyu Liu et.al.	2510.13344	null
2025-10-15	GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models	Chen Zheng et.al.	2510.13079	null
2025-10-14	Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps	Do Tien Hai et.al.	2510.12744	null
2025-10-14	MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts	Yushu Zhao et.al.	2510.12357	null
2025-10-14	DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification	Tao Xie et.al.	2510.12214	null
2025-10-13	Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View	Jinyu Zhang et.al.	2510.11687	null
2025-10-13	Robust Ego-Exo Correspondence with Long-Term Memory	Yijun Hu et.al.	2510.11417	null
2025-10-13	Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers	Wenhan Ma et.al.	2510.11370	null
2025-10-13	What to expect from microscopic nuclear modelling for k $_{\rm eff}$ calculations ?	D. Rochman et.al.	2510.11256	null
2025-10-13	MC#: Mixture Compressor for Mixture-of-Experts Large Models	Wei Huang et.al.	2510.10962	null
2025-10-12	Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation	Ali Atiah Alzahrani et.al.	2510.10807	null
2025-10-12	Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection	Shizhen Zhao et.al.	2510.10584	null
2025-10-12	Hierarchical LoRA MoE for Efficient CTR Model Scaling	Zhichen Zeng et.al.	2510.10432	null
2025-10-11	SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference	Liangkun Chen et.al.	2510.10302	null
2025-10-10	MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest	Xiao Yang et.al.	2510.09857	null
2025-10-10	Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation	Youwei Zheng et.al.	2510.09094	null
2025-10-09	LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution	Xiaohui Li et.al.	2510.08771	null
2025-10-09	FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts	Heming Zou et.al.	2510.08396	null
2025-10-09	Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization	Jason Bohne et.al.	2510.08256	null
2025-10-09	From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill	Gunjun Lee et.al.	2510.08055	null
2025-10-09	Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training	Ruizhe Wang et.al.	2510.08008	null
2025-10-09	Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing	Cunli Mao et.al.	2510.07736	null
2025-10-09	Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision	Xiaoxu Ma et.al.	2510.07703	null
2025-10-09	LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning	Yuhan Sun et.al.	2510.07685	null
2025-10-08	MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting	Yoli Shavit et.al.	2510.07459	null
2025-10-08	Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting	Walid Guettala et.al.	2510.07426	null
2025-10-08	Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts	Fangshuo Liao et.al.	2510.07205	null
2025-10-08	A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages	Zibo Su et.al.	2510.06612	null
2025-10-09	SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation	Shuang Cheng et.al.	2510.06303	null
2025-10-06	Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation"	Ranjan Mishra et.al.	2510.06275	null
2025-10-08	Barbarians at the Gate: How AI is Upending Systems Research	Audrey Cheng et.al.	2510.06189	null
2025-10-07	Rasterized Steered Mixture of Experts for Efficient 2D Image Regression	Yi-Hsin Li et.al.	2510.05814	null
2025-10-07	MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition	Haoxun Li et.al.	2510.05749	null
2025-10-07	Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting	Zhongkai Yu et.al.	2510.05497	null
2025-10-06	Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving	Yue Pan et.al.	2510.05245	null
2025-10-06	REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis	Alec K. Peltekian et.al.	2510.04923	null
2025-10-06	LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0	Jinbo Wen et.al.	2510.04765	null
2025-10-06	Multilingual Routing in Mixture-of-Experts	Lucas Bandarkar et.al.	2510.04694	null
2025-10-06	Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing	Xuanhua Yin et.al.	2510.04670	null
2025-10-05	HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks	Nghiem T. Diep et.al.	2510.04295	null
2025-10-05	SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling	Harshil Vejendla et.al.	2510.04286	null
2025-10-05	MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition	Umberto Cappellazzo et.al.	2510.04136	null
2025-10-03	Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective	Yehuda Dar et.al.	2510.03151	null
2025-10-02	ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models	Gursimran Singh et.al.	2510.02613	null
2025-10-02	UpSafe $^\circ$ C: Upcycling for Controllable Safety in Large Language Models	Yuhao Sun et.al.	2510.02194	null
2025-10-02	LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition	Rixin Zhou et.al.	2510.01651	null
2025-10-01	Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs	Leyla Mirvakhabova et.al.	2510.01185	null
2025-10-01	Learning Compact Representations of LLM Abilities via Item Response Theory	Jianhao Chen et.al.	2510.00844	null
2025-10-01	Graph Integrated Multimodal Concept Bottleneck Model	Jiakai Lin et.al.	2510.00701	null
2025-10-01	FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression	Yifei Gao et.al.	2510.00621	null
2025-10-01	Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning	Minghao Yang et.al.	2510.00570	null
2025-09-30	FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training	Yunqi Gao et.al.	2510.00207	null
2025-09-30	Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization	Yaoxiang Wang et.al.	2509.26520	null
2025-09-30	Nephrobase Cell+: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology	Chenyu Li et.al.	2509.26223	null
2025-09-30	Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline	Haiyang Li et.al.	2509.25991	null
2025-09-30	UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression	Yuan Zhao et.al.	2509.25934	null
2025-09-30	Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel	Chuanyang Zheng et.al.	2509.25913	null
2025-10-01	A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI	Arvind Murari Vepa et.al.	2509.25889	null
2025-09-30	Collaborative Compression for Large-Scale MoE Deployment on Edge	Yixiao Chen et.al.	2509.25689	null
2025-09-30	LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts	Yuan Zhuang et.al.	2509.25684	null
2025-09-30	Guiding Mixture-of-Experts with Temporal Multimodal Interactions	Xing Han et.al.	2509.25678	null
2025-09-29	K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model	Bangwei Guo et.al.	2509.25594	null
2025-09-29	GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference	Yu Han et.al.	2509.25041	null
2025-09-29	LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection	Bao-Ngoc Dao et.al.	2509.24547	null
2025-09-29	One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning	Minh Le et.al.	2509.24483	null
2025-09-29	Muon: Training and Trade-offs with Latent Attention and MoE	Sushant Mehta et.al.	2509.24406	null
2025-09-29	LLaDA-MoE: A Sparse MoE Diffusion Language Model	Fengqi Zhu et.al.	2509.24389	null
2025-09-29	Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning	Zhisheng Chen et.al.	2509.24222	null
2025-09-28	HunyuanImage 3.0 Technical Report	Siyu Cao et.al.	2509.23951	null
2025-09-28	Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms	Jiahao Ying et.al.	2509.23933	null
2025-09-28	Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know	Albus Yizhuo Li et.al.	2509.23830	null
2025-09-28	A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning	Yaya Zhao et.al.	2509.23772	null
2025-09-26	Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time	Yixuan Han et.al.	2509.22572	null
2025-09-26	Learning to Ball: Composing Policies for Long-Horizon Basketball Moves	Pei Xu et.al.	2509.22442	null
2025-09-26	Role-Aware Multi-modal federated learning system for detecting phishing webpages	Bo Wang et.al.	2509.22369	null
2025-09-26	HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space	Ke Li et.al.	2509.22299	null
2025-09-26	Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics	Xingjian Wu et.al.	2509.22279	null
2025-09-26	MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning	Tao Wu et.al.	2509.21953	null
2025-09-26	Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts	Naibin Gu et.al.	2509.21892	null
2025-09-26	ChaosNexus: A Foundation Model for Universal Chaotic System Forecasting with Multi-scale Representations	Chang Liu et.al.	2509.21802	null
2025-09-26	LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE	Yu Shang et.al.	2509.21790	null
2025-09-25	Distributed Specialization: Rare-Token Neurons in Large Language Models	Jing Liu et.al.	2509.21163	null
2025-09-26	Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns	Xuemiao Zhang et.al.	2509.21124	null
2025-09-25	Physics Informed Neural Networks for design optimisation of diamond particle detectors for charged particle fast-tracking at high luminosity hadron colliders	Alessandro Bombini et.al.	2509.21123	null
2025-09-24	Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts in Transformer Architectures	Sampurna Roy et.al.	2509.20577	null
2025-09-24	SHMoAReg: Spark Deformable Image Registration via Spatial Heterogeneous Mixture of Experts and Attention Heads	Yuxi Zheng et.al.	2509.20073	null
2025-09-24	Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference	Ziyi Han et.al.	2509.19781	null
2025-09-23	DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces	Tianshuo Zhang et.al.	2509.19230	null
2025-09-23	Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation	Yunzhe Shen et.al.	2509.18912	null
2025-09-23	LongCat-Flash-Thinking Technical Report	Meituan LongCat Team et.al.	2509.18883	null
2025-09-23	PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving	Chengran Yuan et.al.	2509.18609	null
2025-09-23	Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts	Qi Wang et.al.	2509.18542	null
2025-09-23	StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models	Haoxin Yang et.al.	2509.17993	null
2025-09-23	Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark	Siu Hang Ho et.al.	2509.17894	null
2025-09-22	Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving	Ziming Liu et.al.	2509.17863	null
2025-09-22	Attention-based Mixture of Experts for Robust Speech Deepfake Detection	Viola Negroni et.al.	2509.17585	null
2025-09-22	Robust Mixture Models for Algorithmic Fairness Under Latent Heterogeneity	Siqi Li et.al.	2509.17411	null
2025-09-21	MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE	Soheil Zibakhsh et.al.	2509.17238	null
2025-09-21	CoBEVMoE: Heterogeneity-aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception	Lingzhao Kong et.al.	2509.17107	null
2025-09-21	Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation	Junzhuo Li et.al.	2509.16882	null
2025-09-20	KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control	Jinrui Han et.al.	2509.16638	null
2025-09-19	DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning	Sikai Bai et.al.	2509.16105	null
2025-09-19	MoE-CE: Enhancing Generalization for Deep Learning based Channel Estimation via a Mixture-of-Experts Framework	Tianyu Li et.al.	2509.15964	null
2025-09-19	pFedSAM: Personalized Federated Learning of Segment Anything Model for Medical Image Segmentation	Tong Wang et.al.	2509.15638	null
2025-09-19	MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training	Junbiao Pang et.al.	2509.15514	null
2025-09-18	Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing	Zichen Wu et.al.	2509.15361	null
2025-09-18	Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting	Liran Nochumsohn et.al.	2509.15105	null
2025-09-18	Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning	Lei Wang et.al.	2509.15087	null
2025-09-18	EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence	Chaoyin She et.al.	2509.14977	null
2025-09-18	FURINA: Free from Unmergeable Router via LINear Aggregation of mixed experts	Jiayi Han et.al.	2509.14900	null
2025-09-18	CollabVLA: Self-Reflective Vision-Language-Action Model Dreaming Together with Human	Nan Sun et.al.	2509.14889	null
2025-09-17	CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts	Leonard Hackel et.al.	2509.14104	null
2025-09-18	SAIL-VL2 Technical Report	Weijie Yin et.al.	2509.14033	null
2025-09-17	Semi-MoE: Mixture-of-Experts meets Semi-Supervised Histopathology Segmentation	Nguyen Lan Vi Vu et.al.	2509.13834	null
2025-09-18	Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers	Manan Mittal et.al.	2509.13548	null
2025-09-18	GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR	Yujie Guo et.al.	2509.13093	null
2025-09-16	Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection	Boyu Han et.al.	2509.12990	null
2025-09-16	Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks	Bowen Ye et.al.	2509.12813	null
2025-09-16	MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos	Damola Agbelese et.al.	2509.12772	null
2025-09-17	NavMoE: Hybrid Model- and Learning-based Traversability Estimation for Local Navigation via Mixture of Experts	Botao He et.al.	2509.12747	null
2025-09-16	AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models	Heng Zhang et.al.	2509.12715	null
2025-10-24	Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts	Yunke Qu et.al.	2508.05993	null
2025-07-23	Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models	Changxin Tian et.al.	2507.17702	null
2025-07-23	Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography	Farnoush Bayatmakou et.al.	2507.17662	null
2025-07-23	InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation	Shuai Yang et.al.	2507.17520	null
2025-07-23	Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection	Yehao Lu et.al.	2507.17436	null
2025-07-23	A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model	Zhe Xu et.al.	2507.17303	null
2025-07-23	BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs	Jianmin Hu et.al.	2507.17133	null
2025-07-22	GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI	Joshua Kalyanapu et.al.	2507.17033	null
2025-07-22	Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data	Yunyi Shen et.al.	2507.16817	null
2025-07-22	Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training	Zixiao Huang et.al.	2507.16274	null
2025-07-21	Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure	Alexandra Junell et.al.	2507.16088	null
2025-07-21	Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation	Alessandro B. Melchiorre et.al.	2507.15826	null
2025-07-21	The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts	Sungmin Yun et.al.	2507.15465	null
2025-07-21	Universal crystal material property prediction via multi-view geometric fusion in graph transformers	Liang Zhang et.al.	2507.15303	null
2025-07-20	CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning	Pan Hu et.al.	2507.14903	null
2025-07-23	GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving	Chi Wan et.al.	2507.14456	null
2025-07-18	SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing	Yingying Zhang et.al.	2507.13812	null
2025-07-17	Apple Intelligence Foundation Language Models: Tech Report 2025	Hanzhi Zhou et.al.	2507.13575	null
2025-07-17	R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning	Xiaohan Guo et.al.	2507.13107	null
2025-07-16	Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series	Martina Cádiz-Leyton et.al.	2507.12611	null
2025-07-16	Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models	Gen Luo et.al.	2507.12566	null
2025-07-17	Mixture of Raytraced Experts	Andrea Perin et.al.	2507.12419	null
2025-07-16	CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning	Peiwen Xia et.al.	2507.11834	null
2025-07-15	Mixture of Experts in Large Language Models	Danyang Zhang et.al.	2507.11181	null
2025-07-15	Atmos-Bench: 3D Atmospheric Structures for Climate Insight	Tianchi Xu et.al.	2507.11085	null
2025-07-14	DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models	Luolin Xiong et.al.	2507.09955	null
2025-07-14	ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization	Huilai Li et.al.	2507.09945	null
2025-07-14	Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems	Vindula Jayawardana et.al.	2507.09836	null
2025-07-13	Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts	Aakash Tripathi et.al.	2507.09754	null
2025-07-13	Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive	You Huang et.al.	2507.09612	null
2025-07-12	PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process	Shiqi Jiang et.al.	2507.09242	null
2025-07-11	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	Chenyang Song et.al.	2507.08771	null
2025-07-11	CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes	Tianyou Jiang et.al.	2507.08542	null
2025-07-11	White-Basilisk: A Hybrid Model for Code Vulnerability Detection	Ioannis Lamprou et.al.	2507.08540	null
2025-07-15	KAT-V1: Kwai-AutoThink Technical Report	Zizheng Zhan et.al.	2507.08297	null
2025-07-11	Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization	Woon Ryong Kim et.al.	2507.08269	null
2025-07-10	MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving	Lu Xu et.al.	2507.07818	null
2025-07-10	When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance	Peizhang Shao et.al.	2507.07748	null
2025-07-09	Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning	Ankit Jyothish et.al.	2507.07335	null
2025-07-08	Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate	A. Bochkov et.al.	2507.07129	null
2025-07-09	4KAgent: Agentic Any Image to 4K Super-Resolution	Yushen Zuo et.al.	2507.07105	null
2025-07-11	FlexOlmo: Open Language Models for Flexible Data Use	Weijia Shi et.al.	2507.07024	null
2025-07-09	Deep Disentangled Representation Network for Treatment Effect Estimation	Hui Meng et.al.	2507.06650	null
2025-07-09	SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference	Qian Chen et.al.	2507.06567	null
2025-07-09	MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models	Yiwen Liu et.al.	2507.06502	null
2025-07-08	Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation	Szymon Płotka et.al.	2507.06363	null
2025-07-08	Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis	Xintong Hu et.al.	2507.06116	null
2025-07-09	A Survey on Prompt Tuning	Zongqian Li et.al.	2507.06085	null
2025-07-08	Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors	Bing Wang et.al.	2507.05939	null
2025-07-08	What You Have is What You Track: Adaptive and Robust Multimodal Tracking	Yuedong Tan et.al.	2507.05899	null
2025-07-08	Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition	Zijin Gu et.al.	2507.05724	null
2025-07-08	Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach	Xiaobing Chen et.al.	2507.05685	null
2025-07-08	City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data	Tianxing Wu et.al.	2507.05651	null
2025-07-07	QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks	Hoang-Quan Nguyen et.al.	2507.05190	null
2025-07-07	NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification	Jun Hu et.al.	2507.04870	null
2025-07-07	DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics	Yayu Long et.al.	2507.04661	null
2025-07-08	UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification	Xixi Wan et.al.	2507.04638	null
2025-07-07	Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts	Yun Wang et.al.	2507.04631	null
2025-07-05	Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge	Linshen Liu et.al.	2507.04123	null
2025-07-05	From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM	Xinyi Wu et.al.	2507.03868	null
2025-07-04	Decoupled Relative Learning Rate Schedules	Jan Ludziejewski et.al.	2507.03526	null
2025-07-03	Neural Inhibition Improves Dynamic Routing and Mixture of Experts	Will Y. Zou et.al.	2507.03221	null
2025-07-03	System-performance and cost modeling of Large Language Model training and inference	Wenzhe Guo et.al.	2507.02456	null
2025-07-03	NLP4Neuro: Sequence-to-sequence learning for neural population decoding	Jacob J. Morra et.al.	2507.02264	null
2025-07-02	MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics	Dmytro Kuzmenko et.al.	2507.01843	null
2025-07-02	Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data	Ethan Pawl et.al.	2507.01375	null
2025-07-02	Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model	Chaoxiang Cai et.al.	2507.01351	null
2025-07-02	Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations	Bohao Wang et.al.	2507.01337	null
2025-07-02	ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation	JianChao Zhao et.al.	2507.00502	null
2025-07-01	MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE	Geng Zhang et.al.	2507.00390	null
2025-06-30	MotionGPT3: Human Motion as a Second Modality	Bingfan Zhu et.al.	2506.24086	null
2025-06-30	MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis	Zhe Liu et.al.	2506.23648	null
2025-06-30	Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model	Mu-Chi Chen et.al.	2506.23635	null
2025-06-29	Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging	Lujun Li et.al.	2506.23266	null
2025-06-29	External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting	Haoran Li et.al.	2506.23201	null
2025-06-29	Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound	Zhiyuan Zhu et.al.	2506.23108	null
2025-07-01	Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning	Sanskar Pandey et.al.	2506.22919	null
2025-06-27	Towards Distributed Neural Architectures	Aditya Cowsik et.al.	2506.22389	null
2025-06-27	MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism	Zheng Zhang et.al.	2506.22175	null
2025-06-27	DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE	Hang Shao et.al.	2506.21864	null
2025-06-26	Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts	Jiajie Yang et.al.	2506.21328	null
2025-06-26	Learning to Skip the Middle Layers of Transformers	Tim Lawson et.al.	2506.21103	null
2025-06-26	Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning	Haodong Lu et.al.	2506.21035	null
2025-06-26	EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning	Xiao Zhang et.al.	2506.20986	null
2025-06-25	Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration	Jiaxing Huang et.al.	2506.20282	null
2025-06-23	Multimodal Anomaly Detection with a Mixture-of-Experts	Christoph Willibald et.al.	2506.19077	null
2025-06-23	Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models	Zihan Wang et.al.	2506.18945	null
2025-06-23	Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning	Rahul Atul Bhope et.al.	2506.18789	null
2025-06-23	An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify	Shivam Verma et.al.	2506.18735	null
2025-06-23	Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks	Xiaodong Wu et.al.	2506.18543	null
2025-06-23	SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation	Zichong Li et.al.	2506.18349	null
2025-06-23	Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies	Junchao Fan et.al.	2506.18304	null
2025-06-22	Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection	Zheng Zhan et.al.	2506.18145	null
2025-06-21	Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert	Gelei Xu et.al.	2506.17787	null
2025-06-21	Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities	Xinghao Huang et.al.	2506.17755	null
2025-06-21	PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation	Xinyu Xiong et.al.	2506.17712	null
2025-06-20	SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification	Zhenglin Lai et.al.	2506.17368	null
2025-06-19	FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE	Khiem Le et.al.	2506.16600	null
2025-06-19	Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models	Daniel Fidel Harvey et.al.	2506.16419	null
2025-06-17	Scaling Intelligence: Designing Data Centers for Next-Gen Language Models	Jesmin Jahan Tithi et.al.	2506.15006	null
2025-06-17	NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification	Wajih Hassan Raza et.al.	2506.14970	null
2025-06-17	GMT: General Motion Tracking for Humanoid Whole-Body Control	Zixuan Chen et.al.	2506.14770	null
2025-06-17	Exploring Speaker Diarization with Mixture of Experts	Gaobin Yang et.al.	2506.14750	null
2025-06-18	Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs	Ling Team et.al.	2506.14731	null
2025-06-17	GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors	Hengyuan Zhang et.al.	2506.14646	link
2025-06-17	Single-Example Learning in a Mixture of GPDMs with Latent Geometries	Jesse St. Amand et.al.	2506.14563	null
2025-06-17	MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models	Hongyu Wang et.al.	2506.14435	null
2025-06-16	Load Balancing Mixture of Experts with Similarity Preserving Routers	Nabil Omi et.al.	2506.14038	null
2025-06-16	GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics	Qianzhong Chen et.al.	2506.14009	null
2025-06-16	MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	MiniMax et.al.	2506.13585	link
2025-06-16	Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization	Guanghui Song et.al.	2506.13541	null
2025-06-16	EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization	Zhongqian Fu et.al.	2506.13329	link
2025-06-16	Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs	Xintong Tang et.al.	2506.13192	null
2025-06-15	Serving Large Language Models on Huawei CloudMatrix384	Pengfei Zuo et.al.	2506.12708	null
2025-06-14	Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts	Shengzhuang Chen et.al.	2506.12597	null
2025-06-14	Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control	Rongpeng Li et.al.	2506.12453	null
2025-06-17	HarMoEny: Efficient Multi-GPU Inference of MoE Models	Zachary Doucet et.al.	2506.12417	null
2025-06-14	Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model	Chong Li et.al.	2506.12388	null
2025-06-13	Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?	Houyi Li et.al.	2506.12119	null
2025-06-13	Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution	Zhangkai Ni et.al.	2506.11823	link
2025-06-12	Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts	Zaijing Li et.al.	2506.10357	null
2025-06-11	GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture	GigaChat team et.al.	2506.09440	null
2025-06-11	DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts	Yuchen Feng et.al.	2506.09351	null
2025-06-10	CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA	Jiale Dong et.al.	2506.08496	link
2025-06-11	MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding	Shivang Chopra et.al.	2506.08356	null
2025-06-11	STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation	Yiming Wang et.al.	2506.08054	link
2025-06-09	A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling	Jacob Helwig et.al.	2506.07969	link
2025-06-09	M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration	Yongzhen Wang et.al.	2506.07814	null
2025-06-11	MIRA: Medical Time Series Foundation Model for Real-World Health Data	Hao Li et.al.	2506.07584	null
2025-06-11	MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization	Ken Yaggel et.al.	2506.07563	link
2025-06-09	MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts	Wei Tao et.al.	2506.07533	null
2025-06-09	MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing	Haiyue Ma et.al.	2506.07366	null
2025-06-08	UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment	Wentao Zhao et.al.	2506.07013	null
2025-06-07	High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations	Ziwei Li et.al.	2506.06858	null
2025-06-07	Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning	Yuan Yuan et.al.	2506.06694	null
2025-06-06	Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization	Jonathan Yang et.al.	2506.06196	null
2025-06-06	MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models	Jie Cao et.al.	2506.05928	null
2025-06-06	dots.llm1 Technical Report	Bi Huo et.al.	2506.05767	null
2025-06-05	Mixture-of-Experts Meets In-Context Reinforcement Learning	Wenhao Wu et.al.	2506.05426	null
2025-06-05	Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection	Ziyi Zhou et.al.	2506.04739	null
2025-06-05	FlashDMoE: Fast Distributed MoE in a Single Kernel	Osayamen Jonathan Aimuyo et.al.	2506.04667	link
2025-06-04	Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts	Jiaxing Zhang et.al.	2506.03591	null
2025-06-04	PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs	Ze Yu Zhang et.al.	2506.02965	null
2025-06-03	Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights	Jakub Krajewski et.al.	2506.02890	null
2025-06-03	Brain-Like Processing Pathways Form in Models With Heterogeneous Experts	Jack Cook et.al.	2506.02813	null
2025-06-04	MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection	Juntong Li et.al.	2506.02535	null
2025-06-03	MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework	Yupeng Qi et.al.	2506.02460	null
2025-05-31	Enhancing Multimodal Continual Instruction Tuning with BranchLoRA	Duzhen Zhang et.al.	2506.02041	null
2025-06-02	SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model	Zhao Yang et.al.	2506.01833	link
2025-06-02	Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning	Ryotaro Kawata et.al.	2506.01656	null
2025-06-02	DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models	Jiancheng Ye et.al.	2506.01257	null
2025-06-01	Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts	Fan Liu et.al.	2506.00965	null
2025-05-30	Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction	Shuai Liu et.al.	2505.24597	null
2025-05-30	Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis	Junzhuo Li et.al.	2505.24593	null
2025-05-30	Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer	Yilun Kong et.al.	2505.24378	link
2025-05-30	GradPower: Powering Gradients for Faster Language Model Pre-Training	Mingze Wang et.al.	2505.24275	null
2025-05-30	On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks	Mingze Wang et.al.	2505.24205	null
2025-05-29	Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts	Xuweiyi Chen et.al.	2505.23926	null
2025-06-03	Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert	Zhaokun Wang et.al.	2505.23868	null
2025-05-29	From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents	Tobias Lindenbauer et.al.	2505.23422	link
2025-05-29	Context-Aware Semantic Communication for the Wireless Networks	Guangyuan Liu et.al.	2505.23249	null
2025-05-29	Two Is Better Than One: Rotations Scale LoRAs	Hongcan Guo et.al.	2505.23184	null
2025-05-28	HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	Qi Cai et.al.	2505.22705	link
2025-05-28	Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts	Xue Zhang et.al.	2505.22582	null
2025-05-28	A Human-Centric Approach to Explainable AI for Personalized Education	Vinitra Swamy et.al.	2505.22541	link
2025-05-28	Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion	Kewen Chen et.al.	2505.22360	null
2025-05-28	Advancing Expert Specialization for Better MoE	Hongcan Guo et.al.	2505.22323	null
2025-05-28	ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation	Jiawen Yu et.al.	2505.22159	null
2025-05-28	AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation	Yan Rong et.al.	2505.22053	null
2025-05-28	Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge	Zhongyi Zhou et.al.	2505.21906	null
2025-05-27	MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis	Yitong Li et.al.	2505.21698	null
2025-05-28	Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity	Yehui Tang et.al.	2505.21411	null
2025-05-27	Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities	Junyan Zhang et.al.	2505.21191	null
2025-05-27	Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts	Yue Zhang et.al.	2505.21079	null
2025-05-27	Multi-objective Large Language Model Alignment with Hierarchical Experts	Zhuo Li et.al.	2505.20925	null
2025-05-26	FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models	Hao Kang et.al.	2505.20225	link
2025-05-26	NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID	Shihao Li et.al.	2505.20001	null
2025-05-26	Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments	Junming Liu et.al.	2505.19699	null
2025-05-26	MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-26	Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate	Liangwei Nathan Zheng et.al.	2505.19525	link
2025-05-26	WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	Sihan Chen et.al.	2505.19427	link
2025-05-25	RankLLM: A Python Package for Reranking with LLMs	Sahel Sharifymoghaddam et.al.	2505.19284	null
2025-05-25	I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts	Jiayi Xin et.al.	2505.19190	link
2025-05-24	TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling	Chonghua Han et.al.	2505.18670	null
2025-05-24	ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation	Jian Liang et.al.	2505.18640	link
2025-05-24	Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter	Weizhi Zhong et.al.	2505.18612	null
2025-05-23	Enhancing CTR Prediction with De-correlated Expert Networks	Jiancheng Wang et.al.	2505.17925	null
2025-05-23	PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval	Zehua Pei et.al.	2505.17639	null
2025-05-23	CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning	Jinyuan Feng et.al.	2505.17553	null
2025-05-23	MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation	Kaixing Yang et.al.	2505.17543	null
2025-05-22	JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model	Qihao Duan et.al.	2505.17257	null
2025-05-22	DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving	Zhenjie Yang et.al.	2505.16278	null
2025-05-22	DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor	Yan Zhao et.al.	2505.16256	null
2025-05-21	Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models	Jingcong Liang et.al.	2505.16056	link
2025-05-21	MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding	Yuxiang Wei et.al.	2505.15946	null
2025-05-21	CoLA: Collaborative Low-Rank Adaptation	Yiyun Zhou et.al.	2505.15471	link
2025-05-22	Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought	Tencent Hunyuan Team et.al.	2505.15431	null
2025-05-21	Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks	Uranik Berisha et.al.	2505.15414	null
2025-05-21	Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines	Xiaohou Shi et.al.	2505.15151	null
2025-05-20	Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies	Haoyi Qiu et.al.	2505.14972	link
2025-05-20	Balanced and Elastic End-to-end Training of Dynamic LLMs	Mohamed Wahib et.al.	2505.14864	null
2025-05-20	Solving MNIST with a globally trained Mixture of Quantum Experts	Paolo Alessandro Xavier Tognini et.al.	2505.14789	null
2025-05-20	Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training	Mengru Wang et.al.	2505.14681	null
2025-05-21	Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach	Umberto Cappellazzo et.al.	2505.14336	null
2025-05-20	FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation	Shaolin Zhu et.al.	2505.14256	null
2025-05-20	THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation	Yunlong Liang et.al.	2505.14173	null
2025-05-20	Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition	Shuo Zhang et.al.	2505.14143	null
2025-05-20	Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging	Ryo Bertolissi et.al.	2505.14136	null
2025-05-20	StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning	Huaijie Wang et.al.	2505.13997	null
2025-05-20	Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting	Bao-Ngoc Dao et.al.	2505.13944	link
2025-05-20	U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding	Ziqian Wang et.al.	2505.13880	link
2025-05-20	EfficientLLM: Efficiency in Large Language Models	Zhengqing Yuan et.al.	2505.13840	null
2025-05-19	CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition	Nam V. Nguyen et.al.	2505.13380	link
2025-05-19	Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference	Shuqing Luo et.al.	2505.13345	link
2025-05-19	Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models	Lucas Berry et.al.	2505.13273	null
2025-05-19	True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics	Christoph Jürgen Hemmer et.al.	2505.13192	null
2025-05-19	Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures	Tuan Thai et.al.	2505.13052	null
2025-05-18	Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization	Hongbiao Zhu et.al.	2505.12311	null
2025-05-20	Model Merging in Pre-training of Large Language Models	Yunshui Li et.al.	2505.12082	null
2025-05-20	Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition	Runduo Han et.al.	2505.12007	link
2025-05-17	MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging	Zihuan Qiu et.al.	2505.11883	null
2025-05-17	Improving Coverage in Combined Prediction Sets with Weighted p-values	Gina Wong et.al.	2505.11785	null
2025-05-16	MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production	Chao Jin et.al.	2505.11432	null
2025-05-16	MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Yinsicheng Jiang et.al.	2505.11415	null
2025-05-16	A Fast Kernel-based Conditional Independence test with Application to Causal Discovery	Oliver Schacht et.al.	2505.11085	null
2025-05-16	On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating	Huy Nguyen et.al.	2505.10860	null
2025-05-14	PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	Zongqian Li et.al.	2505.09519	link
2025-05-14	Qwen3 Technical Report	An Yang et.al.	2505.09388	link
2025-05-14	Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures	Chenggang Zhao et.al.	2505.09343	null
2025-05-13	Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony	Shaoyu Wang et.al.	2505.08944	null
2025-05-13	PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts	Yang Su et.al.	2505.08719	null
2025-05-13	AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale	Yunjie Ji et.al.	2505.08311	null
2025-05-12	UMoE: Unifying Attention and FFN with Shared Experts	Yuanhang Yang et.al.	2505.07260	null
2025-05-11	Seed1.5-VL Technical Report	Dong Guo et.al.	2505.07062	null
2025-05-11	FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers	Tianyu Chen et.al.	2505.06858	null
2025-05-11	The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts	Enric Boix-Adsera et.al.	2505.06839	null
2025-05-10	Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free	Zihan Qiu et.al.	2505.06708	link
2025-05-10	Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding	Dawei Huang et.al.	2505.06685	link
2025-05-10	QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration	HamidReza Imani et.al.	2505.06481	null
2025-05-12	FloE: On-the-Fly MoE Inference on Memory-constrained GPU	Yuxin Zhou et.al.	2505.05950	null
2025-05-09	MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design	Haojie Duanmu et.al.	2505.05799	link
2025-05-08	Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts	Ming Li et.al.	2505.05035	null
2025-05-07	Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs	Yehui Tang et.al.	2505.04519	null
2025-05-07	SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios	Ning Cheng et.al.	2505.04201	null
2025-05-07	LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?	Teddy Foley et.al.	2505.04075	link
2025-05-07	Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications	Yuanai Xie et.al.	2505.04068	null
2025-05-06	Towards Smart Point-and-Shoot Photography	Jiawan Li et.al.	2505.03638	null
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-06	STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation	Maolin Wang et.al.	2505.03484	null
2025-05-06	3D Gaussian Splatting Data Compression with Mixture of Priors	Lei Liu et.al.	2505.03310	null
2025-05-05	Finger Pose Estimation for Under-screen Fingerprint Sensor	Xiongjun Guan et.al.	2505.02481	link
2025-05-05	Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems	Kai Zhang et.al.	2505.02381	null
2025-05-05	Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques	Sanjay Surendranath Girija et.al.	2505.02309	null
2025-05-04	Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields	Zhenxing Mi et.al.	2505.02005	link
2025-05-03	Backdoor Attacks Against Patch-based Mixture of Experts	Cedric Chan et.al.	2505.01811	link
2025-05-01	MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling	Abdoul Majid O. Thiombiano et.al.	2505.01459	null
2025-05-02	Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders	Rogelio A Mancisidor et.al.	2505.01134	null
2025-05-02	CoCoAFusE: Beyond Mixtures of Experts via Model Fusion	Aurelio Raffa Ugolini et.al.	2505.01105	null
2025-05-01	Improving Routing in Sparse Mixture of Experts with Graph of Tokens	Tam Nguyen et.al.	2505.00792	null
2025-05-01	CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series	Tian Lan et.al.	2505.00415	null
2025-05-01	Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing	Piotr Piękos et.al.	2505.00315	link
2025-04-30	Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders	Xuwei Yang et.al.	2505.00216	null
2025-04-29	TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts	Pradip Kunwar et.al.	2504.21190	null
2025-04-29	Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization	Shuai Gong et.al.	2504.21063	null
2025-04-26	PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight	Ben Goertzel et.al.	2504.21029	null
2025-04-29	MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification	Yichu Xu et.al.	2504.20509	null
2025-04-29	FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks	Wenjing Xiao et.al.	2504.20446	null
2025-04-29	MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation	Amaan Izhar et.al.	2504.20343	link
2025-04-28	Accelerating Mixture-of-Experts Training with Adaptive Expert Replication	Athinagoras Skiadopoulos et.al.	2504.19925	null
2025-04-28	Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey	Yunting Xu et.al.	2504.19660	null
2025-04-28	ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving	Renju Feng et.al.	2504.19580	link
2025-04-29	BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts	Qingyue Wang et.al.	2504.18598	null
2025-04-25	NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation	Rob Romijnders et.al.	2504.18147	null
2025-04-28	Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection	Haokai Zhang et.al.	2504.17834	link
2025-04-22	Compass-V2 Technical Report	Sophia Maria et.al.	2504.15527	null
2025-04-21	Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images	Jonathan Brokman et.al.	2504.15470	link
2025-04-17	D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving	Haodong Wang et.al.	2504.15299	null
2025-04-23	MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core	Dennis Liu et.al.	2504.14960	null
2025-04-18	Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts	Jie Zou et.al.	2504.13655	null
2025-04-18	HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	Alexander Rusnak et.al.	2504.13590	null
2025-04-18	Dense Backpropagation Improves Training for Sparse Mixture-of-Experts	Ashwinee Panda et.al.	2504.12463	link
2025-04-16	Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models	Yuanbo Tang et.al.	2504.12359	null
2025-04-16	Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data	Sangwon Hyun et.al.	2504.12287	null
2025-04-16	MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models	Hang Yuan et.al.	2504.12234	null
2025-04-15	Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology	Henrik Häggström et.al.	2504.11279	link
2025-04-14	Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning	LeiLei Ma et.al.	2504.09990	null
2025-04-14	Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications	Nathalie Bartoli et.al.	2504.09930	null
2025-04-14	Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming	Zhiqiang He et.al.	2504.09906	null
2025-04-13	Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation	Jia Wei et.al.	2504.09601	null
2025-04-12	MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Yichao Yuan et.al.	2504.09345	null
2025-04-12	Mixture of Group Experts for Learning Invariant Representations	Lei Kang et.al.	2504.09265	null
2025-04-11	RouterKT: Mixture-of-Experts for Knowledge Tracing	Han Liao et.al.	2504.08989	link
2025-04-11	Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design	Robin Grapin et.al.	2504.08671	null
2025-04-10	C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Zhongyang Li et.al.	2504.07964	link
2025-04-11	Scaling Laws for Native Multimodal Models	Mustafa Shukor et.al.	2504.07951	null
2025-04-10	Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models	Hongcheng Guo et.al.	2504.07807	link
2025-04-10	Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network	Peng Jia et.al.	2504.07777	null
2025-04-10	Kimi-VL Technical Report	Kimi Team et.al.	2504.07491	link
2025-04-09	MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution	Zhe Wang et.al.	2504.07308	link
2025-04-11	Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models	Ling Team et.al.	2504.07158	null
2025-04-09	Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations	Zican Dong et.al.	2504.06792	null
2025-04-09	FedMerge: Federated Personalization via Model Merging	Shutong Chen et.al.	2504.06768	null
2025-04-08	S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning	Hanqing Zeng et.al.	2504.06426	null
2025-04-08	HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2504.05897	link
2025-04-08	Adaptive Substructure-Aware Expert Model for Molecular Property Prediction	Tianyi Jiang et.al.	2504.05844	null
2025-04-10	Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations	Ajay Jaiswal et.al.	2504.05586	null
2025-04-07	SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement	Zuying Xie et.al.	2504.04818	null
2025-04-06	On the Spatial Structure of Mixture-of-Experts in Transformers	Daniel Bershatsky et.al.	2504.04444	null
2025-04-05	Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator	Bing Wang et.al.	2504.04076	link
2025-04-04	HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs	Yongji Wu et.al.	2504.03871	null
2025-04-01	Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns	Diego Vallarino et.al.	2504.03750	null
2025-04-04	RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation	Hanbo Bi et.al.	2504.03166	null
2025-04-03	TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models	Xinquan Wang et.al.	2504.02712	null
2025-04-07	MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators	Beichen Huang et.al.	2504.02658	link
2025-04-07	MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism	Ruidong Zhu et.al.	2504.02263	null
2025-04-02	Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design	Mohan Zhang et.al.	2504.01337	null
2025-04-01	Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function	Qiuchen Song et.al.	2504.00819	null
2025-04-01	DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism	Dengchun Li et.al.	2504.00661	link
2025-04-01	Continual Cross-Modal Generalization	Yan Xia et.al.	2504.00561	null
2025-04-01	Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection	Shunxin Chen et.al.	2504.00458	null
2025-03-31	Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion	Jiagen Li et.al.	2503.23721	null
2025-03-30	Mixture of Routers	Jia-Chen Zhang et.al.	2503.23362	null
2025-03-29	Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models	Zehua Liu et.al.	2503.23100	null
2025-03-29	S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning	Giang Do et.al.	2503.23007	null
2025-03-29	Sparse Mixture of Experts as Unified Competitive Learning	Giang Do et.al.	2503.22996	null
2025-04-01	Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities	Raman Dutt et.al.	2503.22517	null
2025-03-27	RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction	Armin Abdollahi et.al.	2503.21971	null
2025-03-27	iMedImage Technical Report	Ran Wei et.al.	2503.21836	null
2025-03-27	LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models	Hengyuan Zhao et.al.	2503.21227	null
2025-03-26	Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework	Soham Sane et.al.	2503.20750	null
2025-03-26	UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines	Chen Tang et.al.	2503.20748	null
2025-03-26	Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning	Sashuai Zhou et.al.	2503.20633	null
2025-03-26	MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation	Rongyu Zhang et.al.	2503.20384	null
2025-03-26	Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning	Yousef Sadegheih et.al.	2503.20326	link
2025-03-25	Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion	Konyul Park et.al.	2503.19776	null
2025-03-25	BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts	Suzhe Xu et.al.	2503.19769	null
2025-03-25	M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation	Ziyuan Liu et.al.	2503.19406	null
2025-03-27	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-03-24	Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding	Tianyu Chen et.al.	2503.18578	null
2025-03-24	SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking	Wenrui Cai et.al.	2503.18338	link
2025-03-23	Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding	Ze Zhang et.al.	2503.18104	link
2025-03-22	Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM	Codefuse et.al.	2503.17793	null
2025-03-25	Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts	Yike Yuan et.al.	2503.16057	null
2025-03-21	UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations	Debabrata Mandal et.al.	2503.15868	null
2025-05-27	Mixture of Lookup Experts	Shibo Jie et.al.	2503.15798	null
2025-03-21	Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication	Sin-Yu Huang et.al.	2503.15722	null
2025-03-19	SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation	Thomas Pickard et.al.	2503.15358	null
2025-03-21	Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition	Seungyeon Cho et.al.	2503.14960	null
2025-03-18	Core-Periphery Principle Guided State Space Model for Functional Connectome Classification	Minheng Chen et.al.	2503.14655	null
2025-03-18	MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts	Runqi Meng et.al.	2503.14355	null
2025-03-18	SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture	Tian Qin et.al.	2503.13808	null
2025-03-17	Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge	Shengling Qin et.al.	2503.13421	null
2025-03-17	Channel Estimation for Pinching-Antenna Systems (PASS)	Jian Xiao et.al.	2503.13268	null
2025-03-17	Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation	Yu Liu et.al.	2503.13254	null
2025-03-16	Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps	Mohammad Al-Jarrah et.al.	2503.12633	link
2025-03-16	MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts	Harshit et.al.	2503.12592	null
2025-03-16	MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification	Jianwei Zhao et.al.	2503.12401	null
2025-03-15	Adaptive Mixture of Experts Learning for Robust Audio Spoofing Detection	Qixian Chen et.al.	2503.12010	null
2025-03-14	FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA	Jieming Bian et.al.	2503.11880	null
2025-03-14	A Review of DeepSeek Models' Key Innovative Techniques	Chengen Wang et.al.	2503.11486	null
2025-03-14	MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling	Rachel S. Y. Teo et.al.	2503.11144	link
2025-03-13	Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores	Chenpeng Wu et.al.	2503.10725	link
2025-03-14	dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis	Luyuan Xie et.al.	2503.10412	null
2025-03-13	StableFusion: Continual Video Retrieval via Frame Adaptation	Zecheng Zhao et.al.	2503.10111	link
2025-03-12	Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework	Bakary Badjie et.al.	2503.09504	null
2025-03-12	Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment	Nazanin Moradinasab et.al.	2503.09498	link
2025-03-12	Astrea: A MOE-based Visual Understanding Model with Progressive Alignment	Xiaoda Yang et.al.	2503.09445	null
2025-03-12	Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach	Ruifeng She et.al.	2503.09357	null
2025-03-12	Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference	Mohammad Siavashi et.al.	2503.09304	null
2025-03-13	FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models	Fufangchen Zhao et.al.	2503.09158	null
2025-03-11	MoE-Loco: Mixture of Experts for Multitask Locomotion	Runhan Huang et.al.	2503.08564	null
2025-03-11	Accelerating MoE Model Inference with Expert Sharding	Oana Balmau et.al.	2503.08467	null
2025-03-11	Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models	Junzhe Li et.al.	2503.08120	null
2025-03-11	MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models	Han Zhao et.al.	2503.08007	null
2025-03-10	GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts	Minwen Liao et.al.	2503.07417	null
2025-03-10	A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Siyuan Mu et.al.	2503.07137	link
2025-03-10	VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots	Fu Chen et.al.	2503.07049	link
2025-03-10	ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration	Mengting Ai et.al.	2503.06881	link
2025-03-10	eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference	Suraiya Tairin et.al.	2503.06823	null
2025-03-09	MoFE: Mixture of Frozen Experts Architecture	Jean Seo et.al.	2503.06491	null
2025-03-09	Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models	Nguyen Do et.al.	2503.06413	link
2025-03-08	MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering	Vinay Kumar Verma et.al.	2503.06296	null
2025-03-08	A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts	Wenzhuo Du et.al.	2503.06064	null
2025-03-08	MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model	Miguel Contreras et.al.	2503.06059	null
2025-03-07	Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning	Justin Chih-Yao Chen et.al.	2503.05641	null
2025-03-07	FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework	Jingyu Xu et.al.	2503.05626	null
2025-03-07	Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts	Weigao Sun et.al.	2503.05447	link
2025-03-07	Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs	Ling Team et.al.	2503.05139	null
2025-03-07	Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts	Shwai He et.al.	2503.05066	null
2025-03-06	Continual Pre-training of MoEs: How robust is your router?	Benjamin Thérien et.al.	2503.05029	null
2025-03-06	Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining	Houyi Li et.al.	2503.04715	null
2025-03-07	Question-Aware Gaussian Experts for Audio-Visual Question Answering	Hongyeob Kim et.al.	2503.04459	link
2025-03-07	Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling	Yan Li et.al.	2503.04398	null
2025-03-06	A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery	Yiheng Zhu et.al.	2503.04362	null
2025-03-06	DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval	Yating Liu et.al.	2503.04144	null
2025-03-05	VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection	Enkhtogtokh Togootogtokh et.al.	2503.03797	link
2025-03-05	Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs	Haoran Fan et.al.	2503.03594	link
2025-03-06	Convergence Rates for Softmax Gating Mixture of Experts	Huy Nguyen et.al.	2503.03213	null
2025-03-04	MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation	Weihang Wang et.al.	2503.02799	link
2025-03-04	FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	Congluo Xu et.al.	2503.02692	null
2025-03-04	Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer	Yujiao Yang et.al.	2503.02495	link
2025-03-04	Tabby: Tabular Data Synthesis with Language Models	Sonia Cromp et.al.	2503.02152	null
2025-03-03	ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition	Nastaran Mansourian et.al.	2503.01750	null
2025-03-03	Effective High-order Graph Representation Learning for Credit Card Fraud Detection	Yao Zou et.al.	2503.01556	null
2025-03-03	DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models	Yongqi Huang et.al.	2503.01359	null
2025-03-03	PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation	Linhai Zhang et.al.	2503.01303	null
2025-03-03	Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting	Xiaobin Hong et.al.	2503.01157	null
2025-03-02	Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion	Daiki Nishiyama et.al.	2503.00925	null
2025-03-01	R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts	Zhongyang Li et.al.	2502.20395	link
2025-02-27	Mixture of Experts for Recognizing Depression from Interview and Reading Tasks	Loukas Ilias et.al.	2502.20213	null
2025-02-27	Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems	Zeyi Ren et.al.	2502.20183	null
2025-02-27	UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook	Yidi Jiang et.al.	2502.20067	null
2025-03-01	Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Shulai Zhang et.al.	2502.19811	link
2025-02-26	Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization	Taishi Nakamura et.al.	2502.19261	null
2025-02-26	OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment	Jiaxin Deng et.al.	2502.18965	null
2025-02-25	Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking	Changyuan Zhao et.al.	2502.18118	null
2025-02-24	The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE	Andrei Chernov et.al.	2502.17391	null
2025-02-24	Delta Decompression for MoE-based LLMs Compression	Hao Gu et.al.	2502.17298	link
2025-02-24	Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks	Andrei Chernov et.al.	2502.17187	null
2025-02-24	Muon is Scalable for LLM Training	Jingyuan Liu et.al.	2502.16982	link
2025-02-24	BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference	Zewen Jin et.al.	2502.16927	null
2025-02-24	ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds	Jiho Han et.al.	2502.16914	null
2025-02-26	Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment	Chenghao Fan et.al.	2502.16894	link
2025-02-22	An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning	Masoud Shokrnezhad et.al.	2502.16198	null
2025-02-21	A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs	Yuan Sun et.al.	2502.15451	link
2025-02-21	Tight Clusters Make Specialized Experts	Stefan K. Nielsen et.al.	2502.15315	link
2025-02-21	Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction	Baohang Zhou et.al.	2502.15290	link
2025-02-20	Ray-Tracing for Conditionally Activated Neural Networks	Claudio Gallicchio et.al.	2502.14788	null
2025-02-21	ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model	Zhongyi Zhou et.al.	2502.14420	link
2025-02-19	Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts	Xin Li et.al.	2502.13577	null
2025-02-18	MoBA: Mixture of Block Attention for Long-Context LLMs	Enzhe Lu et.al.	2502.13189	link
2025-02-18	Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models	Gyeongman Kim et.al.	2502.12947	null
2025-02-18	DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs	Minxuan Lv et.al.	2502.12455	null
2025-02-17	From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs	Kumari Nishu et.al.	2502.12325	null
2025-02-17	Accurate Expert Predictions in MoE Inference via Cross-Layer Gate	Zhiyuan Fang et.al.	2502.12224	null
2025-02-17	How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines	Ayan Sengupta et.al.	2502.12051	null
2025-02-17	Connector-S: A Survey of Connectors in Multi-modal Large Language Models	Xun Zhu et.al.	2502.11453	null
2025-02-16	Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time	Robert Dahlke et.al.	2502.11096	null
2025-02-16	ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models	Shixuan Li et.al.	2502.11059	null
2025-02-15	Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization	Matthew Lyle Olson et.al.	2502.10928	null
2025-02-12	Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution	Bowen Chen et.al.	2502.09654	link
2025-02-14	Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting	Nicholas Dronen et.al.	2502.09500	link
2025-02-12	The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities	Ning Li et.al.	2502.08381	null
2025-02-12	Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification	Xuanze Chen et.al.	2502.08083	null
2025-02-13	Training Sparse Mixture Of Experts Text Embedding Models	Zach Nussbaum et.al.	2502.07972	link
2025-02-11	Memory Analysis on the Training Course of DeepSeek Models	Ping Zhang et.al.	2502.07846	null
2025-02-11	MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks	Lotfi Abdelkrim Mecharbat et.al.	2502.07422	null
2025-02-11	Online Aggregation of Trajectory Predictors	Alex Tong et.al.	2502.07178	null
2025-02-09	Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline	Zhiyuan Fang et.al.	2502.06888	null
2025-02-10	MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing	Seokjin Go et.al.	2502.06643	null
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-02-10	Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models	Peiran Wang et.al.	2502.06094	null
2025-02-08	Mol-MoE: Training Preference-Guided Routers for Molecule Generation	Diego Calanzone et.al.	2502.05633	link
2025-02-08	UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA	Jiale Dong et.al.	2502.05602	link
2025-02-07	fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving	Hanfei Yu et.al.	2502.05370	null
2025-02-07	Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts	Roussel Desmond Nzoyem et.al.	2502.05335	null
2025-02-07	Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient	Jan Ludziejewski et.al.	2502.05172	null
2025-02-06	Mixture of neural operator experts for learning boundary conditions and model selection	Dwyer Deighan et.al.	2502.04562	null
2025-02-06	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei et.al.	2502.04416	link
2025-02-06	Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning	Peizhuang Cong et.al.	2502.03884	null
2025-02-05	(GG) MoE vs. MLP on Tabular Data	Andrei Chernov et.al.	2502.03608	null
2025-02-05	RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts	Tuan Truong et.al.	2502.03044	null
2025-02-05	On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation	Nghiem T. Diep et.al.	2502.03029	null
2025-02-05	Scaling Laws for Upcycling Mixture-of-Experts Language Models	Seng Pei Liew et.al.	2502.03009	null
2025-02-04	ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals	Jianan Nie et.al.	2502.02748	null
2025-02-04	Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism	Yuhao Qing et.al.	2502.02581	null
2025-02-05	Brief analysis of DeepSeek R1 and its implications for Generative AI	Sarah Mercer et.al.	2502.02523	null
2025-02-04	M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference	Nikhil Bhendawade et.al.	2502.02040	null
2025-02-05	MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation	Haibo Tong et.al.	2502.01719	null
2025-02-04	MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs	Yuhang Zhou et.al.	2502.00997	null
2025-02-03	CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling	Xinze Wang et.al.	2502.00965	null
2025-02-02	UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs	Yufei He et.al.	2502.00806	link
2025-02-02	Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective	Yujin Oh et.al.	2502.00619	link
2025-02-01	PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning	Yu Feng et.al.	2502.00354	link
2025-02-01	Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective	Fanqi Yan et.al.	2502.00281	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-01-31	Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning	Minh Le et.al.	2501.18936	null
2025-01-30	MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability	Yan Sun et.al.	2501.18439	null
2025-01-29	Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework	Jung-Hua Liu et.al.	2501.17903	null
2025-01-29	Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks	Lucio La Cava et.al.	2501.17557	null
2025-01-28	3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow	Yueen Ma et.al.	2501.16698	null
2025-01-27	MoEVD: Enhancing Vulnerability Detection by Mixture-of-Experts (MoE)	Xu Yang et.al.	2501.16454	null
2025-01-27	Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference	Yinghan Li et.al.	2501.16103	null
2025-01-25	ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning	Shangqian Gao et.al.	2501.15316	null
2025-01-25	FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts	Ziqi Liu et.al.	2501.15125	link
2025-01-25	Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning	Ziyu Zhao et.al.	2501.15103	null
2025-01-24	Mean-field limit from general mixtures of experts to quantum neural networks	Anderson Melchor Hernandez et.al.	2501.14660	null
2025-01-24	Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation	Shengzhe Zhang et.al.	2501.14269	link
2025-01-24	Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images	Zeyun Deng et.al.	2501.14198	null
2025-01-23	CSAOT: Cooperative Multi-Agent System for Active Object Tracking	Hy Nguyen et.al.	2501.13994	null
2025-01-22	Autonomy-of-Experts Models	Ang Lv et.al.	2501.13074	null
2025-01-22	LLM4WM: Adapting LLM for Wireless Multi-Tasking	Xuanyu Liu et.al.	2501.12983	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR	Guodong Ma et.al.	2501.12602	null
2025-01-21	Modality Interactive Mixture-of-Experts for Fake News Detection	Yifan Liu et.al.	2501.12431	link
2025-01-21	SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection	Xiaocheng Zhang et.al.	2501.12430	null
2025-01-21	Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models	Samira Abnar et.al.	2501.12370	null
2025-01-21	MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks	Qishen Zhou et.al.	2501.12281	link
2025-01-21	Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models	Zihan Qiu et.al.	2501.11873	null
2025-01-18	FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models	Xinglin Pan et.al.	2501.10714	null
2025-01-17	OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning	Jinyuan Feng et.al.	2501.10062	null
2025-01-17	LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading	Kuan-Ming Liu et.al.	2501.09636	null
2025-01-14	MiniMax-01: Scaling Foundation Models with Lightning Attention	MiniMax et.al.	2501.08313	null
2025-01-14	GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism	Chen Tang et.al.	2501.07890	null
2025-01-18	PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration	Xiaoshui Huang et.al.	2501.07762	null
2025-01-13	A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis	Binyu Zhang et.al.	2501.07016	link
2025-01-12	Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning	Hanwen Zhong et.al.	2501.06884	link
2025-01-10	TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning	Yinghao Zhu et.al.	2501.05661	link
2025-01-09	Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing	Mengfan Liu et.al.	2501.05313	null
2025-01-07	LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes	Xiang Xu et.al.	2501.04004	link
2025-01-07	mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training	Xudong Liao et.al.	2501.03905	null
2025-01-08	Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection	Donatella Genovese et.al.	2501.03432	null
2025-01-12	Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning	Zhongyi Zhou et.al.	2501.02198	null
2025-01-03	MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders	Jiajun Cao et.al.	2501.01709	null
2025-01-01	REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization	Huyen Nguyen et.al.	2501.00779	null
2025-01-06	Superposition in Transformers: A Novel Way of Building Mixture of Experts	Ayoub Ben Chaliah et.al.	2501.00530	link
2024-12-31	CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection	Xiaolei Wang et.al.	2501.00346	null
2024-12-29	Multimodal Variational Autoencoder: a Barycentric View	Peijie Qiu et.al.	2412.20487	null
2024-12-29	A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement	Sidra Nasir et.al.	2412.20468	null
2024-12-28	UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity	Jingbo Lin et.al.	2412.20157	link
2024-12-28	Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection	Yaning Zhang et.al.	2412.20156	null
2024-12-27	DeepSeek-V3 Technical Report	DeepSeek-AI et.al.	2412.19437	link
2024-12-26	AskChart: Universal Chart Understanding through Textual Enhancement	Xudong Yang et.al.	2412.19146	link
2024-12-30	Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection	Xiaoyu Huang et.al.	2412.19108	null
2024-12-24	Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making	David Shoresh et.al.	2412.18593	link
2024-12-24	BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing	Yingjie Ma et.al.	2412.18065	link
2024-12-23	UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition	Li Fu et.al.	2412.17507	null
2024-12-23	BrainMAP: Learning Multiple Activation Pathways in Brain Networks	Song Wang et.al.	2412.17404	link
2024-12-22	Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models	Elie Antoine et.al.	2412.16971	null
2024-12-20	Theory of Mixture-of-Experts for Mobile Edge Computing	Hongbo Li et.al.	2412.15690	null
2024-12-19	MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale	Swapnil Gandhi et.al.	2412.15411	null
2024-12-19	Qwen2.5 Technical Report	Qwen et.al.	2412.15115	link
2024-12-19	ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing	Ziteng Wang et.al.	2412.14711	link
2024-12-18	A Survey on Inference Optimization Techniques for Mixture of Experts Models	Jiacheng Liu et.al.	2412.14219	link
2024-12-18	SEKE: Specialised Experts for Keyword Extraction	Matej Martinc et.al.	2412.14087	link
2024-12-18	MedCoT: Medical Chain of Thought via Hierarchical Expert	Jiaxiang Liu et.al.	2412.13736	link
2024-12-17	SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks	Mátyás Vincze et.al.	2412.13053	link
2024-12-17	Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning	Moritz Reuss et.al.	2412.12953	null
2024-12-17	CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition	He Wang et.al.	2412.12760	null
2024-12-16	Investigating Mixture of Experts in Dense Retrieval	Effrosyni Sokli et.al.	2412.11864	null
2024-12-18	Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture	Jingze Shi et.al.	2412.11834	link
2024-12-16	Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation	Svetlana Pavlitska et.al.	2412.11608	link
2024-12-16	Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture	Jingyu Xu et.al.	2412.11557	null
2024-12-14	DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification	Yuhao Wang et.al.	2412.10650	link
2024-12-13	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Zhiyu Wu et.al.	2412.10302	link
2024-12-13	Llama 3 Meets MoE: Efficient Upcycling	Aditya Vavre et.al.	2412.09952	link
2024-12-12	Memory Layers at Scale	Vincent-Pierre Berges et.al.	2412.09764	link
2024-12-12	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang et.al.	2412.09278	link
2024-12-12	Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective	Minh Le et.al.	2412.08285	null
2024-12-11	Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification	Xuanze Chen et.al.	2412.08193	link
2024-12-10	MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems	Yao Fu et.al.	2412.07067	null
2024-12-07	Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts	Arturo Rodriguez et.al.	2412.06842	null
2024-12-09	Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset	Xiao Wang et.al.	2412.06647	link
2024-12-09	UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts	Zhen Wan et.al.	2412.06340	null
2024-12-08	Hallucination-aware Optimization for Large Language Model-empowered Communications	Yinqiu Liu et.al.	2412.06007	link
2024-12-10	An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism	Qing Zhang et.al.	2412.05821	null
2024-12-10	RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts	Xu Liu et.al.	2412.05679	link
2024-12-07	SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts	Gengze Zhou et.al.	2412.05552	link
2024-12-07	Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers	Boxun Xu et.al.	2412.05540	null
2024-12-06	Steps are all you need: Rethinking STEM Education with Prompt Engineering	Krishnasai Addala et.al.	2412.05023	null
2024-12-09	Monet: Mixture of Monosemantic Experts for Transformers	Jungwoo Park et.al.	2412.04139	link
2024-12-05	Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks	Zhaoyang Liu et.al.	2412.03850	null
2024-12-04	Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond	Loukas Ilias et.al.	2412.03483	null
2024-12-05	MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption	Siddhant Dutta et.al.	2412.01858	null
2024-12-05	Yi-Lightning Technical Report	01. AI et.al.	2412.01253	null
2024-11-30	Mixture of Experts for Node Classification	Yu Shi et.al.	2412.00418	null
2024-11-30	HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting	Shaohan Yu et.al.	2412.00316	null
2024-11-27	Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference	Andrii Skliar et.al.	2412.00099	null
2024-11-29	LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References	Shuguo Jiang et.al.	2411.19758	null
2024-11-28	On the effectiveness of discrete representations in sparse mixture of experts	Giang Do et.al.	2411.19402	null
2024-11-28	Bayesian Cluster Weighted Gaussian Models	Panagiotis Papastamoulis et.al.	2411.18957	link
2024-11-27	UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS	Haomin Zhuang et.al.	2411.18797	null
2024-11-27	Complexity Experts are Task-Discriminative Learners for Any Image Restoration	Eduard Zamfir et.al.	2411.18466	null
2024-11-27	Mixture of Experts in Image Classification: What's the Sweet Spot?	Mathurin Videau et.al.	2411.18322	null
2024-11-26	$H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs	Selim Furkan Tekin et.al.	2411.17792	link
2024-11-25	Staleness-Centric Optimizations for Efficient Diffusion MoE Inference	Jiajun Luo et.al.	2411.16786	null
2024-11-29	MH-MoE: Multi-Head Mixture-of-Experts	Shaohan Huang et.al.	2411.16205	null
2024-11-25	LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy	Peng Cui et.al.	2411.16095	null
2024-11-24	Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution	Haiquan Wang et.al.	2411.15871	null
2024-11-24	LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training	Xiaoye Qu et.al.	2411.15708	link
2024-11-23	Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts	Qizhou Chen et.al.	2411.15432	null
2024-11-23	Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation	Fahao Chen et.al.	2411.15419	null
2024-11-20	MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification	Yuxuan Chen et.al.	2411.13004	null
2024-11-23	KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning	Ming Yin et.al.	2411.12950	null
2024-11-19	Ultra-Sparse Memory Network	Zihao Huang et.al.	2411.12364	null
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-16	Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts	Jinqiang Long et.al.	2411.10669	link
2024-11-15	Weakly-Supervised Multimodal Learning on MIMIC-CXR	Andrea Agostini et.al.	2411.10356	link
2024-11-21	Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models	Wei Wang et.al.	2411.10003	null
2024-11-13	Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection	Vima Gupta et.al.	2411.08982	null
2024-11-13	Sparse Upcycling: Inference Inefficient Finetuning	Sasha Doubov et.al.	2411.08968	null
2024-11-13	LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing	Xiaonan Nie et.al.	2411.08446	null
2024-11-12	Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach	Renzi Wang et.al.	2411.08232	null
2024-11-12	PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model	Yilun Liu et.al.	2411.08212	null
2024-11-12	Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge	Emmanuel Azuh Mensah et.al.	2411.07834	null
2024-11-11	Adaptive Conditional Expert Selection Network for Multi-domain Recommendation	Kuiyao Dong et.al.	2411.06826	null
2024-11-11	WDMoE: Wireless Distributed Mixture of Experts for Large Language Models	Nan Xue et.al.	2411.06681	null
2024-11-09	Learning Mixtures of Experts with EM	Quentin Fruytier et.al.	2411.06056	null
2024-11-08	NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts	Yen-Ting Lin et.al.	2411.05945	null
2024-11-05	DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts	Zelin Yao et.al.	2411.03025	link
2024-11-05	Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts	Yuan Xie et.al.	2411.02787	null
2024-11-06	Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Xingwu Sun et.al.	2411.02265	null
2024-11-04	FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation	Ziwei Zhan et.al.	2411.02115	null
2024-11-03	RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering	Hui Lin et.al.	2411.01595	null
2024-11-03	Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation	Mingrui Liu et.al.	2411.01457	null
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-11-07	HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy	Shuqing Luo et.al.	2411.01288	link
2024-11-02	PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment	Dongxu Liu et.al.	2411.01245	null
2024-11-01	MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition	Cheng Yang et.al.	2411.01016	null
2024-11-01	LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nam V. Nguyen et.al.	2411.00918	link
2024-11-01	MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization	Jingming Guo et.al.	2411.00662	link
2024-10-31	Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts	Xiang Deng et.al.	2410.23836	null
2024-10-30	Efficient and Interpretable Grammatical Error Correction with Mixture of Experts	Muhammad Reza Qorib et.al.	2410.23507	link
2024-10-30	Stealing User Prompts from Mixture of Experts	Itay Yona et.al.	2410.22884	null
2024-10-30	MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning	Xujia Wang et.al.	2410.22782	null
2024-10-29	ProMoE: Fast MoE-based LLM Serving using Proactive Caching	Xiaoniu Song et.al.	2410.22134	null
2024-10-29	Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging	Li Shen et.al.	2410.21804	null
2024-10-29	Neural Experts: Mixture of Experts for Implicit Neural Representations	Yizhak Ben-Shabat et.al.	2410.21643	null
2024-10-28	FinTeamExperts: Role Specialized MOEs For Financial Analysis	Yue Yu et.al.	2410.21338	null
2024-10-28	Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving	Jiyao Wang et.al.	2410.21086	null
2024-10-27	Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation	Maohao Shen et.al.	2410.20336	null
2024-10-27	GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields	Yusuke Sekikawa et.al.	2410.20306	null
2024-10-25	DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction	Zelin Zang et.al.	2410.19504	link
2024-10-25	Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis	Weikai Li et.al.	2410.19225	link
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-24	Mixture of Parrots: Experts improve memorization more than reasoning	Samy Jelassi et.al.	2410.19034	null
2024-10-24	MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases	Zhisheng Lin et.al.	2410.18406	null
2024-10-23	Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches	Kexin Feng et.al.	2410.18298	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2024-10-24	ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference	Xin He et.al.	2410.17954	null
2024-10-23	Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition	Artem Basharin et.al.	2410.17765	null
2024-10-22	Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling	Jialong Li et.al.	2410.17043	null
2024-10-21	LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset	Ruikun Zhang et.al.	2410.16095	link
2024-10-22	CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts	Zhenpeng Su et.al.	2410.16077	link
2024-10-21	Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Qiao Sun et.al.	2410.15774	link
2024-10-21	ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts	Xumeng Han et.al.	2410.15732	null
2024-10-20	Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs	Xin Zhou et.al.	2410.15438	null
2024-10-20	LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration	Yuang Ai et.al.	2410.15385	link
2024-10-19	MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning	Suning Huang et.al.	2410.14972	null
2024-10-18	MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts	Rachel S. Y. Teo et.al.	2410.14574	link
2024-10-18	ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction	Haoyu He et.al.	2410.14099	link
2024-10-17	Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks	Jinze Zhao et.al.	2410.13964	null
2024-10-16	On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs	Herun Wan et.al.	2410.12600	null
2024-10-16	Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts	Fanqi Yan et.al.	2410.12258	null
2024-10-16	EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference	Yulei Qian et.al.	2410.12247	null
2024-10-15	MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router	Yanyue Xie et.al.	2410.12013	null
2024-10-15	MoH: Multi-Head Attention as Mixture-of-Head Attention	Peng Jin et.al.	2410.11842	link
2024-10-15	GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation	Fei Tang et.al.	2410.11841	link
2024-10-15	Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models	James Vo et.al.	2410.11654	null
2024-10-16	Quadratic Gating Functions in Mixture of Experts: A Statistical Insight	Pedram Akbarian et.al.	2410.11222	null
2024-10-16	Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free	Ziyue Li et.al.	2410.10814	link
2024-10-14	Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts	Guorui Zheng et.al.	2410.10626	link
2024-10-14	Learning to Ground VLMs without Forgetting	Aritra Bhowmik et.al.	2410.10491	null
2024-10-14	Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Xu Liu et.al.	2410.10469	null
2024-10-15	Ada-K Routing: Boosting the Efficiency of MoE-based LLMs	Tongtian Yue et.al.	2410.10456	null
2024-10-14	Tighter Risk Bounds for Mixtures of Experts	Wissam Akretche et.al.	2410.10397	null
2024-10-14	Scalable Multi-Domain Adaptation of Language Models using Modular Experts	Peter Schafhalter et.al.	2410.10181	null
2024-10-14	Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models	Jun Luo et.al.	2410.10114	link
2024-10-14	AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality	Peijun Qing et.al.	2410.10054	link
2024-10-13	ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL	Zhanqiu Guo et.al.	2410.09781	null
2024-10-11	Semi-Supervised Learning of Noisy Mixture of Experts Models	Oh-Ran Kwon et.al.	2410.09039	null
2024-10-11	Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering	I-Chun Chen et.al.	2410.08589	link
2024-10-10	Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts	Sukwon Yun et.al.	2410.08245	link
2024-10-10	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	Gen Luo et.al.	2410.08202	null
2024-10-10	Efficient Dictionary Learning with Switch Sparse Autoencoders	Anish Mudide et.al.	2410.08201	link
2024-10-10	More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing	Sagi Shaier et.al.	2410.08003	link
2024-10-10	SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture	Jiayi Han et.al.	2410.07739	null
2024-10-10	Upcycling Large Language Models into Mixture of Experts	Ethan He et.al.	2410.07524	null
2024-10-09	MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Peng Jin et.al.	2410.07348	link
2024-10-09	Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders	David Noever et.al.	2410.06462	null
2024-10-09	Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs	Ruijia Niu et.al.	2410.06431	null
2024-10-08	Probing the Robustness of Theory of Mind in Large Language Models	Christian Nickel et.al.	2410.06271	null
2024-10-08	MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More	Wei Huang et.al.	2410.06270	link
2024-10-08	Aria: An Open Multimodal Native Mixture-of-Experts Model	Dongxu Li et.al.	2410.05993	link
2024-10-08	Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models	Siqi Wang et.al.	2410.05661	null
2024-10-07	Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild	Xinyu Zhao et.al.	2410.05357	link
2024-10-07	Multimodal Fusion Strategies for Mapping Biophysical Landscape Features	Lucia Gordon et.al.	2410.04833	link
2024-10-06	Realizing Video Summarization from the Path of Language-based Semantic Understanding	Kuan-Chen Mu et.al.	2410.04511	null
2024-10-09	Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding	Wei Wu et.al.	2410.03553	null
2024-10-04	Exploring the Benefit of Activation Sparsity in Pre-training	Zhengyan Zhang et.al.	2410.03440	link
2024-10-03	MLP-KAN: Unifying Deep Representation and Function Learning	Yunhong He et.al.	2410.03027	link
2024-10-03	On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions	Huy Nguyen et.al.	2410.02935	null
2024-10-03	Neutral residues: revisiting adapters for model extension	Franck Signe Talla et.al.	2410.02744	null
2024-10-03	Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping	Ziye Huang et.al.	2410.02475	null
2024-10-03	MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction	Zhaojian Yu et.al.	2410.02241	null
2024-10-03	Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts	Minh Le et.al.	2410.02200	link
2024-10-04	Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices	Andres Potapczynski et.al.	2410.02117	link
2024-10-04	EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing	Haotian Sun et.al.	2410.02098	null
2024-10-02	Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL	Ghada Sokar et.al.	2410.01930	null
2024-10-02	Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models	Shayekh Bin Islam et.al.	2410.01782	link
2024-10-02	Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging	Tingfeng Hui et.al.	2410.01610	null
2024-10-02	The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Hong Li et.al.	2410.01417	null
2024-10-01	MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards	Sheng Wang et.al.	2410.00938	null
2024-10-01	UniAdapt: A Universal Adapter for Knowledge Calibration	Tai D. Nguyen et.al.	2410.00454	null
2024-10-01	Robust Traffic Forecasting against Spatial Shift over Years	Hongjun Wang et.al.	2410.00373	link
2024-09-29	IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method	Chaohui Xu et.al.	2410.00059	null
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566	null
2024-10-02	CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Jihai Zhang et.al.	2409.19291	link
2024-09-27	SciDFM: A Large Language Model with Mixture-of-Experts for Science	Liangtai Sun et.al.	2409.18412	null
2024-09-26	Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Xun Zhu et.al.	2409.17508	link
2024-09-26	A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction	Guangyu Wang et.al.	2409.17440	link
2024-09-24	Leveraging Mixture of Experts for Improved Speech Deepfake Detection	Viola Negroni et.al.	2409.16077	null
2024-10-02	Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts	Xiaoming Shi et.al.	2409.16040	link
2024-09-24	Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM	Fengrun Zhang et.al.	2409.15905	null
2024-09-24	Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks	Jiayi He et.al.	2409.15695	null
2024-09-23	A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts	Hugo Inzirillo et.al.	2409.15161	link
2024-09-23	Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond	Hong Chen et.al.	2409.14993	null
2024-09-21	Routing in Sparsely-gated Language Models responds to Context	Stefan Arnold et.al.	2409.14107	null
2024-09-20	On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists	Dongyang Fan et.al.	2409.13931	link
2024-09-20	Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning	Annette Spooner et.al.	2409.13791	null
2024-09-19	Robust Audiovisual Speech Recognition Models with Mixture-of-Experts	Yihan Wu et.al.	2409.12370	null
2024-09-18	GRIN: GRadient-INformed MoE	Liyuan Liu et.al.	2409.12136	null
2024-09-18	Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0	Zhiyong Wang et.al.	2409.11909	link
2024-09-17	LPT++: Efficient Training on Mixture of Long-tailed Experts	Bowen Dong et.al.	2409.11323	null
2024-09-19	LOLA -- An Open-Source Massively Multilingual Large Language Model	Nikit Srivastava et.al.	2409.11272	link
2024-09-16	Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression	Yi-Hsin Li et.al.	2409.10101	null
2024-09-14	MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Enming Zhang et.al.	2409.07267	link
2024-09-10	DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models	Maryam Akhavan Aghdam et.al.	2409.06669	null
2024-09-10	STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Jaeseong Lee et.al.	2409.06211	null
2024-09-10	VE: Modeling Multivariate Time Series Correlation with Variate Embedding	Shangjiong Wang et.al.	2409.06169	link
2024-09-09	Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models	Hongyang Lei et.al.	2409.05929	link
2024-09-09	Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks	Bo Xu et.al.	2409.05726	null
2024-09-09	Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection	Tianwu Lei et.al.	2409.05611	null
2024-09-05	Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions	Zemian Ke et.al.	2409.03282	null
2024-09-05	ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding	Zhengzhuo Xu et.al.	2409.03277	null
2024-09-05	xLAM: A Family of Large Action Models to Empower AI Agent Systems	Jianguo Zhang et.al.	2409.03215	link
2024-09-04	Configurable Foundation Models: Building LLMs from a Modular Perspective	Chaojun Xiao et.al.	2409.02877	null
2024-09-04	Pluralistic Salient Object Detection	Xuelu Feng et.al.	2409.02368	null
2024-09-03	OLMoE: Open Mixture-of-Experts Language Models	Niklas Muennighoff et.al.	2409.02060	link
2024-09-05	Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model	Hukai Huang et.al.	2409.02050	null
2024-09-02	Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning	Soumajyoti Sarkar et.al.	2409.01483	null
2024-09-02	Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching	Sungmin Yun et.al.	2409.01141	null
2024-09-04	Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack	Guanzhong Chen et.al.	2409.00960	link
2024-09-02	Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts	Youngseog Chung et.al.	2409.00879	null
2024-08-29	Gradient-free variational learning with conditional mixture networks	Conor Heins et.al.	2408.16429	link
2024-08-28	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	Yuncheng Yang et.al.	2408.15915	link
2024-08-28	Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts	Nikolas Gritsch et.al.	2408.15901	null
2024-08-28	LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Fangxun Shu et.al.	2408.15881	link
2024-08-28	Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts	Lean Wang et.al.	2408.15664	null
2024-08-27	Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis	Sakhinana Sagar Srinivas et.al.	2408.15305	null
2024-08-27	MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce	Hao Jiang et.al.	2408.14968	null
2024-08-24	Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings	Sagar Srinivas Sakhinana et.al.	2408.13622	null
2024-08-23	The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities	Venkatesh Balavadhani Parthasarathy et.al.	2408.13296	null
2024-08-23	Guiding IoT-Based Healthcare Alert Systems with Large Language Models	Yulan Gao et.al.	2408.13071	null
2024-08-23	DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation	Xiaowei Mao et.al.	2408.12809	link
2024-08-23	Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth	Yuxiang Wei et.al.	2408.12803	null
2024-08-23	La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection	Hang Zou et.al.	2408.12793	null
2024-08-22	SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging	Mohammadreza Pourreza et.al.	2408.12733	null
2024-08-22	Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Jamba Team et.al.	2408.12570	null
2024-08-22	Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators	Dingkang Yang et.al.	2408.12325	link
2024-08-21	MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing	Hao Zhou et.al.	2408.11396	link
2024-08-21	KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting?	Xiao Han et.al.	2408.11306	link
2024-08-21	FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts	Hanzi Mei et.al.	2408.11304	null
2024-08-20	Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data	Atmika Gorti et.al.	2408.11247	null
2024-08-20	Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting	Jianxiang Zhou et.al.	2408.10822	link
2024-08-20	AnyGraph: Graph Foundation Model in the Wild	Lianghao Xia et.al.	2408.10700	link
2024-08-20	HMoE: Heterogeneous Mixture of Experts for Language Modeling	An Wang et.al.	2408.10681	null
2024-08-19	AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2408.10284	link
2024-08-17	FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models	Xiaochen Wang et.al.	2408.10276	link
2024-08-19	Customizing Language Models with Instance-wise LoRA for Sequential Recommendation	Xiaoyu Kong et.al.	2408.10159	link
2024-08-19	A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method	Hang Zou et.al.	2408.09752	null
2024-08-16	Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection	Haohao Zhu et.al.	2408.08551	link
2024-08-17	BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts	Qizhen Zhang et.al.	2408.08274	null
2024-08-14	Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation	CanYi Liu et.al.	2408.07427	null
2024-08-13	A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning	Prateek Yadav et.al.	2408.07057	null
2024-08-13	Layerwise Recurrent Router for Mixture-of-Experts	Zihan Qiu et.al.	2408.06793	link
2024-08-13	AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies	Bo-Wen Zhang et.al.	2408.06567	null
2024-08-10	HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou	Xu Wang et.al.	2408.05430	null
2024-08-08	Understanding the Performance and Estimating the Cost of LLM Fine-Tuning	Yuchen Xia et.al.	2408.04693	link
2024-08-08	Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training	Weilin Cai et.al.	2408.04307	null
2024-08-08	LaDiMo: Layer-wise Distillation Inspired MoEfier	Sungyoon Kim et.al.	2408.04278	null
2024-08-07	MoExtend: Tuning New Experts for Modality and Task Extension	Shanshan Zhong et.al.	2408.03511	link
2024-08-05	Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization	Changtao Miao et.al.	2408.02306	null
2024-08-02	HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction	Xingyu Lou et.al.	2408.01332	null
2024-08-01	Multimodal Fusion and Coherence Modeling for Video Topic Segmentation	Hai Yu et.al.	2408.00365	null
2024-08-12	MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts	Xi Victoria Lin et.al.	2407.21770	null
2024-07-31	PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning	Min Jae Jung et.al.	2407.21571	null
2024-07-30	Distribution Learning for Molecular Regression	Nima Shoghi et.al.	2407.20475	null
2024-07-29	Time series forecasting with high stakes: A field study of the air cargo industry	Abhinav Garg et.al.	2407.20192	null
2024-07-30	Mixture of Nested Experts: Adaptive Processing of Visual Tokens	Gagan Jain et.al.	2407.19985	null
2024-07-28	Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models	Mohammed Al-Maamari et.al.	2407.19610	link
2024-07-26	Wolf: Captioning Everything with a World Summarization Framework	Boyi Li et.al.	2407.18908	null
2024-07-26	MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition	Chang Liu et.al.	2407.18616	link
2024-07-26	Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition	Hukai Huang et.al.	2407.18581	link
2024-07-25	How Lightweight Can A Vision Transformer Be	Jen Hong Tan et.al.	2407.17783	null
2024-07-24	Exploring Domain Robust Lightweight Reward Models based on Router Mechanism	Hyuk Namgoong et.al.	2407.17546	null
2024-07-24	M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis	Junyu Li et.al.	2407.17267	link
2024-07-25	Cheems: Wonderful Matrices More Efficient and More Effective Architecture	Jingze Shi et.al.	2407.16958	null
2024-07-22	Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget	Vikash Sehwag et.al.	2407.15811	link
2024-07-22	Norface: Improving Facial Expression Analysis by Identity Normalization	Hanwei Liu et.al.	2407.15617	link
2024-07-19	Mixture of Experts with Mixture of Precisions for Tuning Quality of Service	HamidReza Imani et.al.	2407.14417	null
2024-07-19	EVLM: An Efficient Vision-Language Model for Visual Understanding	Kaibing Chen et.al.	2407.14177	null
2024-07-19	Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models	Qiong Wu et.al.	2407.14093	null
2024-07-18	Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts	Francesco Folino et.al.	2407.13526	null
2024-07-18	Mixture of Experts based Multi-task Supervise Learning from Crowds	Tao Han et.al.	2407.13268	null
2024-07-15	MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration	Yulin Ren et.al.	2407.10833	null
2024-07-18	Qwen2 Technical Report	An Yang et.al.	2407.10671	link
2024-07-15	Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering	Francesco Di Sario et.al.	2407.10389	null
2024-07-13	Low-Rank Interconnected Adaptation Across Layers	Yibo Zhong et.al.	2407.09946	link
2024-07-13	MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts	Zhenpeng Su et.al.	2407.09816	link
2024-07-12	Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts	Zeliang Zhang et.al.	2407.09590	null
2024-07-11	An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio	Siding Zeng et.al.	2407.08239	null
2024-07-10	MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations	Vignesh Prasad et.al.	2407.07636	link
2024-07-10	Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation	Szymon Płotka et.al.	2407.07514	link
2024-07-09	A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts	Atilla Özgür et.al.	2407.06718	null
2024-07-06	SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation	Guoan Wang et.al.	2407.04938	null
2024-07-06	Completed Feature Disentanglement Learning for Multimodal MRIs Analysis	Tianling Liu et.al.	2407.04916	link
2024-07-05	YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation	Sungkyun Chang et.al.	2407.04822	link
2024-07-05	Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement	Yongji Wu et.al.	2407.04656	null
2024-07-05	MobileFlow: A Multimodal LLM For Mobile GUI Agent	Songqin Nong et.al.	2407.04346	null
2024-07-04	Mixture of A Million Experts	Xu Owen He et.al.	2407.04153	null
2024-07-02	Terminating Differentiable Tree Experts	Jonathan Thomm et.al.	2407.02060	null
2024-07-05	Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models	Zihan Wang et.al.	2407.01906	link
2024-07-01	Uncertainty Quantification in Table Structure Recognition	Kehinde Ajayi et.al.	2407.01731	link
2024-07-01	Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning	Yixiao Wang et.al.	2407.01531	null
2024-07-01	Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation	Nadezhda Chirkova et.al.	2407.01126	null
2024-07-01	Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs	Enshu Liu et.al.	2407.00945	link
2024-07-03	Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules	Xinglin Pan et.al.	2407.00599	link
2024-07-02	One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts	Ruochen Wang et.al.	2407.00256	null
2024-06-28	LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models	Renzhi Wang et.al.	2406.20030	null
2024-06-28	Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model	Longrong Yang et.al.	2406.19905	link
2024-06-28	SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR	Qiuming Zhao et.al.	2406.19706	link
2024-06-27	A Teacher Is Worth A Million Instructions	Nikhil Kothari et.al.	2406.19112	link
2024-06-27	Towards Personalized Federated Multi-scenario Multi-task Recommendation	Yue Ding et.al.	2406.18938	null
2024-06-26	Mixture of Experts in a Mixture of RL settings	Timon Willi et.al.	2406.18420	null
2024-06-26	A Closer Look into Mixture-of-Experts in Large Language Models	Ka Man Lo et.al.	2406.18219	link
2024-06-26	SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR	Shuaishuai Ye et.al.	2406.18021	null
2024-06-24	Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction	Bruce Rushing et.al.	2406.17150	link
2024-06-24	LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training	Tong Zhu et.al.	2406.16554	link
2024-06-25	OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser	Jingze Shi et.al.	2406.16495	link
2024-06-24	Theory on Mixture-of-Experts in Continual Learning	Hongbo Li et.al.	2406.16437	null
2024-06-22	SimSMoE: Solving Representational Collapse via Similarity Measure	Giang Do et.al.	2406.15883	null
2024-06-20	Voice Disorder Analysis: a Transformer-based Approach	Alkis Koudounas et.al.	2406.14693	link
2024-06-19	Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation	Qian Chen et.al.	2406.13583	null
2024-06-19	AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models	Zihao Zeng et.al.	2406.13233	link
2024-06-18	Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Haoxiang Wang et.al.	2406.12845	link
2024-06-18	P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts	Yuhao Dan et.al.	2406.12548	null
2024-06-18	Variational Distillation of Diffusion Policies into Mixture of Experts	Hongyi Zhou et.al.	2406.12538	null
2024-06-18	GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory	Haoze Wu et.al.	2406.12375	link
2024-06-17	Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding	Ukyo Honda et.al.	2406.12060	link
2024-06-17	DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	DeepSeek-AI et.al.	2406.11931	link
2024-06-17	Graph Knowledge Distillation to Mixture of Experts	Pavel Rumiantsev et.al.	2406.11919	link
2024-06-17	$\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts	Guanjie Chen et.al.	2406.11353	link
2024-06-17	Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts	Tong Zhu et.al.	2406.11256	link
2024-06-14	Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion	Anke Tang et.al.	2406.09770	link
2024-06-13	DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts	Joel Ong et.al.	2406.08742	link
2024-06-12	Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark	Pingzhi Li et.al.	2406.08155	link
2024-06-11	Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters	Yixin Song et.al.	2406.05955	null
2024-06-08	Flexible and Adaptable Summarization via Expertise Separation	Xiuying Chen et.al.	2406.05360	link
2024-06-07	MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter	Jitai Hao et.al.	2406.04984	link
2024-06-07	MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks	Xingkui Zhu et.al.	2406.04801	link
2024-06-05	Style Mixture of Experts for Expressive Text-To-Speech Synthesis	Ahad Jawaid et.al.	2406.03637	null
2024-06-05	Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach	Haoyu Han et.al.	2406.03464	null
2024-06-05	Continual Traffic Forecasting via Mixture of Experts	Sanghyun Lee et.al.	2406.03140	null
2024-06-05	Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models	Raeid Saqur et.al.	2406.02969	null
2024-06-04	Parrot: Multilingual Visual Instruction Tuning	Hai-Long Sun et.al.	2406.02539	link
2024-06-04	Demystifying the Compression of Mixture-of-Experts Through a Unified Framework	Shwai He et.al.	2406.02500	link
2024-06-02	Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model	Clement Etienam et.al.	2406.00889	link
2024-06-01	A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers	Daniel Waxman et.al.	2406.00570	link
2024-06-01	Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks	Jiacheng Wang et.al.	2406.00408	null
2024-05-30	Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach	Reza Arabpour et.al.	2405.20094	null
2024-06-02	MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors	Renzhi Wang et.al.	2405.19086	null
2024-06-02	Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design	Markus J. Buehler et.al.	2405.19076	link
2024-05-29	Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization	Shengcai Liu et.al.	2405.18884	link
2024-05-29	MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models	Taehyun Kim et.al.	2405.18832	null
2024-05-29	Yuan 2.0-M32: Mixture of Experts with Attention Router	Shaohua Wu et.al.	2405.17976	link
2024-05-28	LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design	Rui Kong et.al.	2405.17741	null
2024-05-27	Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node	Andreas Charalampopoulos et.al.	2405.16836	link
2024-05-26	Mixture of Experts Using Tensor Products	Zhan Su et.al.	2405.16671	link
2024-05-30	A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts	Mohammed Nowaz Rabbani Chowdhury et.al.	2405.16646	null
2024-05-26	Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation	Rongyu Zhang et.al.	2405.16486	link
2024-05-25	MoEUT: Mixture-of-Experts Universal Transformers	Róbert Csordás et.al.	2405.16039	link
2024-05-23	Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	Xianzhi Du et.al.	2405.15052	link
2024-05-23	Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast	Chufan Shi et.al.	2405.14507	link
2024-05-23	Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models	Yongxin Guo et.al.	2405.14297	link
2024-05-23	Graph Sparsification via Mixture of Graphs	Guibin Zhang et.al.	2405.14260	link
2024-05-23	Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts	Huy Nguyen et.al.	2405.14131	null
2024-05-23	Mixture of Experts Meets Prompt-Based Continual Learning	Minh Le et.al.	2405.14124	link
2024-05-22	Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts	Huy Nguyen et.al.	2405.13997	null
2024-05-22	xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token	Xin Cheng et.al.	2405.13792	link
2024-05-24	MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models	Jingwei Xu et.al.	2405.13053	link
2024-05-21	Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts	Ruichen Zhang et.al.	2405.12472	null
2024-05-21	Ensemble and Mixture-of-Experts DeepONets For Operator Learning	Ramansh Sharma et.al.	2405.11907	link
2024-05-19	Learning More Generalized Experts by Merging Experts in Mixture-of-Experts	Sejik Park et.al.	2405.11530	null
2024-05-18	Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts	Yunxin Li et.al.	2405.11273	link
2024-05-16	Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts	Ruolin Su et.al.	2405.09744	null
2024-05-15	M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts	Yufeng Jiang et.al.	2405.09446	link
2024-05-13	Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition	Zhiyong Yang et.al.	2405.07780	link
2024-05-07	SUTRA: Scalable Multilingual Language Model Architecture	Abhijit Bendale et.al.	2405.06694	null
2024-05-09	A Mixture of Experts Approach to 3D Human Motion Prediction	Edmund Shieh et.al.	2405.06088	link
2024-05-09	A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds	Christopher Z. Cui et.al.	2405.06059	null
2024-05-09	EWMoE: An effective model for global weather forecasting with mixture-of-experts	Lihao Gan et.al.	2405.06004	link
2024-05-09	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts	Jiachen Li et.al.	2405.05949	link
2024-05-16	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	DeepSeek-AI et.al.	2405.04434	link
2024-05-07	Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts	Changyuan Zhao et.al.	2405.04198	null
2024-05-06	Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training	Zexuan Zhong et.al.	2405.03133	null
2024-05-06	WDMoE: Wireless Distributed Large Language Models with Mixture of Experts	Nan Xue et.al.	2405.03131	null
2024-05-31	Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models	Xudong Lu et.al.	2402.14800	null
2024-10-29	GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts	Shirley Wu et.al.	2312.04693	null
2021-05-25	Tensor-variate Mixture of Experts for Proportional Myographic Control of a Robotic Hand	Noémie Jaquier et.al.	1902.11104	null
2018-06-22	Mixtures of Experts Models	Isobel Claire Gormley et.al.	1806.08200	null

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 2,873 Commits
.github		.github
assets		assets
docs		docs
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
daily_arxiv.py		daily_arxiv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updated on 2026.03.09

inference

MoE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Updated on 2026.03.09

inference

MoE

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages