| 2026-03-06 |
RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering |
Gaia A. Bertolino et.al. |
2603.06542 |
null |
| 2026-03-06 |
A Mixture-of-Experts Framework for Practical Hybrid-Quantum Models in Credit Card Fraud Detection |
Rodrigo Chaves et.al. |
2603.06473 |
null |
| 2026-03-06 |
MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis |
Dongqing Xie et.al. |
2603.06378 |
null |
| 2026-03-06 |
MoEless: Efficient MoE LLM Serving via Serverless Computing |
Hanfei Yu et.al. |
2603.06350 |
null |
| 2026-03-06 |
WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection |
Peng Chen et.al. |
2603.06313 |
null |
| 2026-03-06 |
GazeMoE: Perception of Gaze Target with Mixture-of-Experts |
Zhuangzhuang Dai et.al. |
2603.06256 |
null |
| 2026-03-06 |
EvoESAP: Non-Uniform Expert Pruning for Sparse MoE |
Zongfang Liu et.al. |
2603.06003 |
null |
| 2026-03-06 |
MoE Lens -- An Expert Is All You Need |
Marmik Chaudhari et.al. |
2603.05806 |
null |
| 2026-03-06 |
Sparse Crosscoders for diffing MoEs and Dense models |
Marmik Chaudhari et.al. |
2603.05805 |
null |
| 2026-03-05 |
Change Point Detection for Cell Populations Measured via Flow Cytometry |
Yik Lun Kei et.al. |
2603.05700 |
null |
| 2026-03-05 |
NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension |
Rongzhi Li et.al. |
2603.05046 |
null |
| 2026-03-05 |
Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation |
Yilong Chen et.al. |
2603.04971 |
null |
| 2026-03-05 |
Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling |
Yong Liu et.al. |
2603.04791 |
null |
| 2026-03-05 |
TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings |
Yebo Wu et.al. |
2603.04772 |
null |
| 2026-03-04 |
ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model |
Yuhao Xu et.al. |
2603.04589 |
null |
| 2026-03-04 |
Augmenting representations with scientific papers |
Nicolò Oreste Pinciroli Vago et.al. |
2603.04516 |
null |
| 2026-03-04 |
Benchmarking Quantum Computers via Protocols, Comparing IBM's Heron vs IBM's Eagle |
Nitay Mayo et.al. |
2603.04377 |
null |
| 2026-03-04 |
RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation |
Yixin Chen et.al. |
2603.04348 |
null |
| 2026-03-04 |
CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation |
Jinfeng Xu et.al. |
2603.04320 |
null |
| 2026-03-04 |
UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization |
Qianfeng Yang et.al. |
2603.03967 |
null |
| 2026-03-03 |
Modeling Cross-vision Synergy for Unified Large Vision Model |
Shengqiong Wu et.al. |
2603.03564 |
null |
| 2026-03-03 |
Beyond Language Modeling: An Exploration of Multimodal Pretraining |
Shengbang Tong et.al. |
2603.03276 |
null |
| 2026-03-04 |
MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection |
Jun Yeong Park et.al. |
2603.03101 |
null |
| 2026-03-03 |
CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots |
Shihao Ma et.al. |
2603.03067 |
null |
| 2026-03-03 |
EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education |
Baoliang Chen et.al. |
2603.03066 |
null |
| 2026-03-03 |
Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs |
Wuyue Zhang et.al. |
2603.02731 |
null |
| 2026-03-03 |
TenExp: Mixture-of-Experts-Based Tensor Decomposition Structure Search Framework |
Ting-Wei Zhou et.al. |
2603.02720 |
null |
| 2026-03-03 |
MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration |
Lingshun Kong et.al. |
2603.02710 |
null |
| 2026-03-03 |
Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data |
Sijie Mai et.al. |
2603.02695 |
null |
| 2026-03-03 |
Robust Heterogeneous Analog-Digital Computing for Mixture-of-Experts Models with Theoretical Generalization Guarantees |
Mohammed Nowaz Rabbani Chowdhury et.al. |
2603.02633 |
null |
| 2026-03-02 |
DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks |
Gökdeniz Gülmez et.al. |
2603.01697 |
null |
| 2026-03-02 |
PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification |
Jian Yu et.al. |
2603.01547 |
null |
| 2026-03-02 |
Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification |
Jiayang Wu et.al. |
2603.01511 |
null |
| 2026-03-02 |
UETrack: A Unified and Efficient Framework for Single Object Tracking |
Ben Kang et.al. |
2603.01412 |
null |
| 2026-03-02 |
Fed-GAME: Personalized Federated Learning with Graph Attention Mixture-of-Experts For Time-Series Forecasting |
Yi Li et.al. |
2603.01363 |
null |
| 2026-03-01 |
Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning |
Hamed Damirchi et.al. |
2603.01326 |
null |
| 2026-03-01 |
TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading |
Yudong Pan et.al. |
2603.01058 |
null |
| 2026-03-01 |
Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving |
Xubo Zhu et.al. |
2603.01007 |
null |
| 2026-02-28 |
MME: Mixture of Mesh Experts with Random Walk Transformer Gating |
Amir Belder et.al. |
2603.00828 |
null |
| 2026-02-27 |
Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization |
Chenwei Jia et.al. |
2602.24059 |
null |
| 2026-02-26 |
Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG |
Hanning Guo et.al. |
2602.23410 |
null |
| 2026-02-26 |
A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations |
Soumya Dutta et.al. |
2602.23300 |
null |
| 2026-02-26 |
Learning Physical Operators using Neural Operators |
Vignesh Gopakumar et.al. |
2602.23113 |
null |
| 2026-02-26 |
pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation |
Shentong Mo et.al. |
2602.22938 |
null |
| 2026-02-26 |
Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting |
Fabian Muşat et.al. |
2602.22685 |
null |
| 2026-02-26 |
Predictive variational inference for flexible regression models |
Lucas Kock et.al. |
2602.22582 |
null |
| 2026-02-25 |
NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training |
Dengdi Sun et.al. |
2602.22059 |
null |
| 2026-02-25 |
Excitation: Momentum For Experts |
Sagi Shaier et.al. |
2602.21798 |
null |
| 2026-02-25 |
Learning from Yesterday's Error: An Efficient Online Learning Method for Traffic Demand Prediction |
Xiannan Huang et.al. |
2602.21757 |
null |
| 2026-02-25 |
TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts |
Jiafeng Lin et.al. |
2602.21693 |
null |
| 2026-02-25 |
Multi-Layer Scheduling for MoE-Based LLM Reasoning |
Yifan Sun et.al. |
2602.21626 |
null |
| 2026-02-24 |
Dual-Branch INS/GNSS Fusion with Inequality and Equality Constraints |
Mor Levenhar et.al. |
2602.21266 |
null |
| 2026-02-25 |
GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer |
Wenbo Yu et.al. |
2602.20871 |
null |
| 2026-02-24 |
Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA |
Nuocheng Yang et.al. |
2602.20492 |
null |
| 2026-02-23 |
The Universal Eccentricity Distribution for Dynamical Gravitational-Wave Merger Channels |
Mor Rozner et.al. |
2602.20110 |
null |
| 2026-02-23 |
Counterfactual Understanding via Retrieval-aware Multimodal Modeling for Time-to-Event Survival Prediction |
Ha-Anh Hoang Nguyen et.al. |
2602.19987 |
null |
| 2026-02-23 |
A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs |
Zijie Liu et.al. |
2602.19938 |
null |
| 2026-02-23 |
Towards Dexterous Embodied Manipulation via Deep Multi-Sensory Fusion and Sparse Expert Scaling |
Yirui Sun et.al. |
2602.19764 |
null |
| 2026-02-23 |
RAID: Retrieval-Augmented Anomaly Detection |
Mingxiu Cai et.al. |
2602.19611 |
null |
| 2026-02-23 |
Conversational AI for Automated Patient Questionnaire Completion: Development Insights and Design Principles |
David Fraile Navarro et.al. |
2602.19507 |
null |
| 2026-02-23 |
EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting |
Angzi Xu et.al. |
2602.19485 |
null |
| 2026-02-22 |
Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts |
Toshihide Ubukata et.al. |
2602.19244 |
null |
| 2026-02-22 |
SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation |
Yujie Lu et.al. |
2602.19213 |
null |
| 2026-02-22 |
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation |
Kai Liu et.al. |
2602.19163 |
null |
| 2026-02-22 |
Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection |
Hossein Shokouhinejad et.al. |
2602.19025 |
null |
| 2026-02-21 |
Give Users the Wheel: Towards Promptable Recommendation Paradigm |
Fuyuan Lyu et.al. |
2602.18929 |
null |
| 2026-02-20 |
Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory |
Vatsal Agarwal et.al. |
2602.18434 |
null |
| 2026-02-19 |
Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds |
Ibne Farabi Shihab et.al. |
2602.17798 |
null |
| 2026-02-19 |
Phase-Aware Mixture of Experts for Agentic Reinforcement Learning |
Shengtian Yang et.al. |
2602.17038 |
null |
| 2026-02-19 |
Arcee Trinity Large Technical Report |
Varun Singh et.al. |
2602.17004 |
null |
| 2026-02-18 |
Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes |
Srikumar Nayak et.al. |
2602.16109 |
null |
| 2026-02-17 |
MoE-Spec: Expert Budgeting for Efficient Speculative Decoding |
Bradley McDanel et.al. |
2602.16052 |
null |
| 2026-02-17 |
ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns |
Ziyu Zhao et.al. |
2602.15521 |
null |
| 2026-02-16 |
Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs |
Ali Khalesi et.al. |
2602.15091 |
null |
| 2026-02-15 |
DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices |
Songyuan Li et.al. |
2602.14301 |
null |
| 2026-02-15 |
MILD: Multi-Intent Learning and Disambiguation for Proactive Failure Prediction in Intent-based Networking |
Md. Kamrul Hossain et.al. |
2602.14283 |
null |
| 2026-02-15 |
Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection |
Pinqiao Wang et.al. |
2602.14251 |
null |
| 2026-02-15 |
Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization |
Rizhen Hu et.al. |
2602.14159 |
null |
| 2026-02-15 |
LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts |
Yang Liu et.al. |
2602.14060 |
null |
| 2026-02-15 |
Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models |
Sajjad Kachuee et.al. |
2602.14039 |
null |
| 2026-02-15 |
Eureka-Audio: Triggering Audio Intelligence in Compact Language Models |
Dan Zhang et.al. |
2602.13954 |
null |
| 2026-02-14 |
Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening |
The Tien Mai et.al. |
2602.13888 |
null |
| 2026-02-13 |
Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning |
Jon Irureta et.al. |
2602.12708 |
null |
| 2026-02-13 |
Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers |
Anrui Chen et.al. |
2602.12587 |
null |
| 2026-02-13 |
SD-MoE: Spectral Decomposition for Effective Expert Specialization |
Ruijun Huang et.al. |
2602.12556 |
null |
| 2026-02-13 |
Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR |
Jaeyoung Lee et.al. |
2602.12546 |
null |
| 2026-02-12 |
Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration |
Akhiad Bercovich et.al. |
2602.11937 |
null |
| 2026-02-12 |
LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training |
Xinyi Liu et.al. |
2602.11686 |
null |
| 2026-02-12 |
Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts |
Haiyang Jiang et.al. |
2602.11622 |
null |
| 2026-02-12 |
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm |
Jinrui Zhang et.al. |
2602.11543 |
null |
| 2026-02-11 |
Demonstration and performance of an online data selection algorithm for liquid argon time projection chambers using MicroBooNE |
MicroBooNE collaboration et.al. |
2602.11138 |
null |
| 2026-02-11 |
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs |
Yupu Gu et.al. |
2602.10965 |
null |
| 2026-02-11 |
CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control |
Riccardo Barbano et.al. |
2602.10933 |
null |
| 2026-02-11 |
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training |
Guobin Shen et.al. |
2602.10693 |
null |
| 2026-02-11 |
Multimodal Priors-Augmented Text-Driven 3D Human-Object Interaction Generation |
Yin Wang et.al. |
2602.10659 |
null |
| 2026-02-11 |
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters |
Ailin Huang et.al. |
2602.10604 |
null |
| 2026-02-11 |
Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity |
Guangzhi Xiong et.al. |
2602.10585 |
null |
| 2026-02-10 |
Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching |
Hanyuan Gao et.al. |
2602.10254 |
null |
| 2026-02-10 |
Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning |
Ruopeng Cui et.al. |
2602.09767 |
null |
| 2026-02-10 |
DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment |
Bohan Fu et.al. |
2602.09531 |
null |
| 2026-02-10 |
SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity |
Yukun Zhang et.al. |
2602.09386 |
null |
| 2026-02-10 |
Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density |
Zhendong Mi et.al. |
2602.09316 |
null |
| 2026-02-09 |
Generalizing GNNs with Tokenized Mixture of Experts |
Xiaoguang Guo et.al. |
2602.09258 |
null |
| 2026-02-09 |
UI-Venus-1.5 Technical Report |
Veuns-Team et.al. |
2602.09082 |
null |
| 2026-02-09 |
DirMoE: Dirichlet-routed Mixture of Experts |
Amirhossein Vahidi et.al. |
2602.09001 |
null |
| 2026-02-09 |
OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation |
Yehua Huang et.al. |
2602.08896 |
null |
| 2026-02-09 |
FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models |
Annemette Brok Pirchert et.al. |
2602.08818 |
null |
| 2026-02-10 |
MOVA: Towards Scalable and Synchronized Video-Audio Generation |
SII-OpenMOSS Team et.al. |
2602.08794 |
null |
| 2026-02-09 |
Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views |
Duc-Anh Nguyen et.al. |
2602.08755 |
null |
| 2026-02-09 |
Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing |
Jona te Lintelo et.al. |
2602.08741 |
null |
| 2026-02-09 |
6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks |
Mohamed Amine Ferrag et.al. |
2602.08675 |
null |
| 2026-02-09 |
Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models |
Mingzi Cao et.al. |
2602.08658 |
null |
| 2026-02-09 |
Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs |
Yukun Jiang et.al. |
2602.08621 |
null |
| 2026-02-09 |
TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration |
Linye Wei et.al. |
2602.08404 |
null |
| 2026-02-06 |
Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing |
Meng Lou et.al. |
2602.06862 |
null |
| 2026-02-06 |
POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models |
Yi Chen et.al. |
2602.06822 |
null |
| 2026-02-06 |
HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction |
Shengxuan Qiu et.al. |
2602.06527 |
null |
| 2026-02-05 |
To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training |
Meghana Madhyastha et.al. |
2602.06183 |
null |
| 2026-02-05 |
MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models |
Nurbek Tastan et.al. |
2602.06154 |
null |
| 2026-02-05 |
OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale |
Jingze Shi et.al. |
2602.05711 |
null |
| 2026-02-04 |
Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection |
Bharadwaj Dogga et.al. |
2602.05100 |
null |
| 2026-02-04 |
Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism |
Chenwei Cui et.al. |
2602.04870 |
null |
| 2026-02-04 |
ERNIE 5.0 Technical Report |
Haifeng Wang et.al. |
2602.04705 |
null |
| 2026-02-04 |
Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting |
Zhen Zhou et.al. |
2602.04678 |
null |
| 2026-02-04 |
RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models |
Jiacheng Liang et.al. |
2602.04448 |
null |
| 2026-02-04 |
Mixture of Masters: Sparse Chess Language Models with Player Routing |
Giacomo Frisoni et.al. |
2602.04447 |
null |
| 2026-02-04 |
Expert Selections In MoE Models Reveal (Almost) As Much As Text |
Amir Nuriyev et.al. |
2602.04105 |
null |
| 2026-02-03 |
SpecMD: A Comprehensive Study On Speculative Expert Prefetching |
Duc Hoang et.al. |
2602.03921 |
null |
| 2026-02-03 |
DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs |
Zeyu Zhu et.al. |
2602.03495 |
null |
| 2026-02-03 |
Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts |
Meng Lou et.al. |
2602.03473 |
null |
| 2026-02-03 |
VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers |
Zhiwen Li et.al. |
2602.03210 |
null |
| 2026-02-03 |
Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry |
Ye Su et.al. |
2602.03204 |
null |
| 2026-02-02 |
SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning |
Qifan Yu et.al. |
2602.02472 |
null |
| 2026-02-02 |
Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models |
Noam Steinmetz Yalon et.al. |
2602.02467 |
null |
| 2026-02-02 |
From Directions to Regions: Decomposing Activations in Language Models via Local Geometry |
Or Shafran et.al. |
2602.02464 |
null |
| 2026-02-02 |
DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild |
Arnab Das et.al. |
2602.02286 |
null |
| 2026-02-02 |
MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology |
Susu Hu et.al. |
2602.02282 |
null |
| 2026-02-02 |
Edge-Aligned Initialization of Kernels for Steered Mixture-of-Experts |
Martin Determann et.al. |
2602.02031 |
null |
| 2026-02-02 |
SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning |
Zhen-Hao Xie et.al. |
2602.01990 |
null |
| 2026-02-02 |
Mixture-of-Experts with Intermediate CTC Supervision for Accented Speech Recognition |
Wonjun Lee et.al. |
2602.01967 |
null |
| 2026-02-02 |
SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures |
Liangtao Lin et.al. |
2602.01858 |
null |
| 2026-02-02 |
Mutual-Guided Expert Collaboration for Cross-Subject EEG Classification |
Zhi Zhang et.al. |
2602.01728 |
null |
| 2026-01-31 |
Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity |
Fanqi Yan et.al. |
2602.00939 |
null |
| 2026-01-31 |
Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs |
Hao Mark Chen et.al. |
2602.00879 |
null |
| 2026-01-31 |
Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion |
Tianyang Wu et.al. |
2602.00678 |
null |
| 2026-01-31 |
SEER: Transformer-based Robust Time Series Forecasting via Automated Patch Enhancement and Replacement |
Xiangfei Qiu et.al. |
2602.00589 |
null |
| 2026-01-31 |
PROBE: Co-Balancing Computation and Communication in MoE Inference via Real-Time Predictive Prefetching |
Qianchao Zhu et.al. |
2602.00509 |
null |
| 2026-01-30 |
UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling |
Pingping Liu et.al. |
2601.22746 |
null |
| 2026-01-30 |
A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization |
Shiye Lei et.al. |
2601.22718 |
null |
| 2026-01-30 |
A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation |
Haonan He et.al. |
2601.22708 |
null |
| 2026-01-30 |
Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments |
Jinwoo Jang et.al. |
2601.22647 |
null |
| 2026-01-30 |
SpanNorm: Reconciling Training Stability and Performance in Deep Transformers |
Chao Wang et.al. |
2601.22580 |
null |
| 2026-01-30 |
Continual Policy Distillation from Distributed Reinforcement Learning Teachers |
Yuxuan Li et.al. |
2601.22475 |
null |
| 2026-01-29 |
ECO: Quantized Training without Full-Precision Master Weights |
Mahdi Nikdan et.al. |
2601.22101 |
null |
| 2026-01-29 |
MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts |
Lorenzo Mazza et.al. |
2601.21971 |
null |
| 2026-01-29 |
MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts |
Evandro S. Ortigossa et.al. |
2601.21866 |
null |
| 2026-01-29 |
Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers |
Evandro S. Ortigossa et.al. |
2601.21641 |
null |
| 2026-01-29 |
Multi-Modal Time Series Prediction via Mixture of Modulated Experts |
Lige Zhang et.al. |
2601.21547 |
null |
| 2026-01-29 |
ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory |
Yang Zhao et.al. |
2601.21545 |
null |
| 2026-01-29 |
L $^3$ : Large Lookup Layers |
Albert Tseng et.al. |
2601.21461 |
null |
| 2026-01-29 |
L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts |
Minghao Yang et.al. |
2601.21349 |
null |
| 2026-01-29 |
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies |
Ce Hao et.al. |
2601.21251 |
null |
| 2026-01-29 |
Scaling Embeddings Outperforms Scaling Experts in Language Models |
Hong Liu et.al. |
2601.21204 |
null |
| 2026-01-29 |
ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling |
Yuchen Yang et.al. |
2601.21198 |
null |
| 2026-01-29 |
BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding |
Ziyi Zhao et.al. |
2601.21148 |
null |
| 2026-01-29 |
TRACE: Trajectory Recovery for Continuous Mechanism Evolution in Causal Representation Learning |
Shicheng Fan et.al. |
2601.21135 |
null |
| 2026-01-28 |
ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler |
Bohua Zou et.al. |
2601.20755 |
null |
| 2026-01-28 |
Unsupervised Ensemble Learning Through Deep Energy-based Models |
Ariel Maymon et.al. |
2601.20556 |
null |
| 2026-01-28 |
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution |
Le Zhang et.al. |
2601.20380 |
null |
| 2026-01-28 |
OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion |
Shuoyan Wei et.al. |
2601.20308 |
null |
| 2026-01-28 |
MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting |
Jing Xu et.al. |
2601.20300 |
null |
| 2026-01-28 |
HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH |
Yueyang Wang et.al. |
2601.20255 |
null |
| 2026-01-28 |
Control Models for In-IDE Code Completion |
Aral de Moor et.al. |
2601.20223 |
null |
| 2026-01-28 |
Hyperparameter Transfer with Mixture-of-Expert Layers |
Tianze Jiang et.al. |
2601.20205 |
null |
| 2026-01-27 |
Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts |
TrungKhang Tran et.al. |
2601.19811 |
null |
| 2026-01-27 |
Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes |
Yifan Wang et.al. |
2601.19723 |
null |
| 2026-01-27 |
Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition |
Isha Pandey et.al. |
2601.19451 |
null |
| 2026-01-26 |
Fauna Sprout: A lightweight, approachable, developer-ready humanoid robot |
Fauna Robotics et.al. |
2601.18963 |
null |
| 2026-01-26 |
OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion |
Zhichao Wang et.al. |
2601.18094 |
null |
| 2026-01-26 |
LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts |
Venmugil Elango et.al. |
2601.18089 |
null |
| 2026-01-25 |
Domain-Expert-Guided Hybrid Mixture-of-Experts for Medical AI: Integrating Data-Driven Learning with Clinical Priors |
Jinchen Gu et.al. |
2601.17977 |
null |
| 2026-01-25 |
$\infty$ -MoE: Generalizing Mixture of Experts to Infinite Experts |
Shota Takashiro et.al. |
2601.17680 |
null |
| 2026-01-24 |
PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes |
Xinru Cui et.al. |
2601.17440 |
null |
| 2026-01-23 |
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts |
Xuan-Phi Nguyen et.al. |
2601.17111 |
null |
| 2026-01-22 |
FlashMoE: Reducing SSD I/O Bottlenecks via ML-Based Cache Replacement for Mixture-of-Experts Inference on Edge Devices |
Byeongju Kim et.al. |
2601.17063 |
null |
| 2026-01-23 |
GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints |
Andy Zhu et.al. |
2601.16905 |
null |
| 2026-01-23 |
Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation |
Tims Pecerskis et.al. |
2601.16863 |
null |
| 2026-01-23 |
LongCat-Flash-Thinking-2601 Technical Report |
Meituan LongCat Team et.al. |
2601.16725 |
null |
| 2026-01-22 |
LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting |
Yuhan Chen et.al. |
2601.15772 |
null |
| 2026-01-21 |
Improving MoE Compute Efficiency by Composing Weight and Data Sparsity |
Maciej Kilian et.al. |
2601.15370 |
null |
| 2026-01-21 |
Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization |
Adam Rokah et.al. |
2601.15021 |
null |
| 2026-01-21 |
Modeling the Thermal Behavior of Photopolymers for In-Space Fabrication |
Jonathan Ericson et.al. |
2601.14897 |
null |
| 2026-01-21 |
UniRoute: Unified Routing Mixture-of-Experts for Modality-Adaptive Remote Sensing Change Detection |
Qingling Shu et.al. |
2601.14797 |
null |
| 2026-01-21 |
Robustness of Mixtures of Experts to Feature Noise |
Dong Sun et.al. |
2601.14792 |
null |
| 2026-01-20 |
Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models |
YuanLab. ai et.al. |
2601.14327 |
null |
| 2026-01-20 |
Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering |
Yuxin Chen et.al. |
2601.14050 |
null |
| 2026-01-20 |
DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging |
Adrien Meyer et.al. |
2601.13954 |
null |
| 2026-01-20 |
The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) II. The radial structure of debris discs |
Yinuo Han et.al. |
2601.13670 |
null |
| 2026-01-20 |
MN-TSG:Continuous Time Series Generation with Irregular Observations |
Xu Zhang et.al. |
2601.13534 |
null |
| 2026-01-19 |
CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks |
Mingshuang Luo et.al. |
2601.13133 |
null |
| 2026-01-19 |
Polychronous Wave Computing: Timing-Native Address Selection in Spiking Networks |
Natalila G. Berloff et.al. |
2601.13079 |
null |
| 2026-01-19 |
PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning |
Zhiyan Hou et.al. |
2601.13020 |
null |
| 2026-01-19 |
HT-GNN: Hyper-Temporal Graph Neural Network for Customer Lifetime Value Prediction in Baidu Ads |
Xiaohui Zhao et.al. |
2601.13013 |
null |
| 2026-01-19 |
OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models |
Shiyuan Li et.al. |
2601.12996 |
null |
| 2026-01-19 |
PhyG-MoE: A Physics-Guided Mixture-of-Experts Framework for Energy-Efficient GNSS Interference Recognition |
Zhihan Zeng et.al. |
2601.12798 |
null |
| 2026-01-18 |
The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) V: Comparison between scattered light and thermal emission |
J. Milli et.al. |
2601.12586 |
null |
| 2026-01-18 |
A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding |
Hoang Viet Nguyen et.al. |
2601.12483 |
null |
| 2026-01-18 |
Learning Diverse Skills for Behavior Models with Mixture of Experts |
Wangtian Shen et.al. |
2601.12397 |
null |
| 2026-01-18 |
NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages |
Lakshya Tomar et.al. |
2601.12389 |
null |
| 2026-01-18 |
GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer |
Xinyuan Zhao et.al. |
2601.12316 |
null |
| 2026-01-18 |
Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation |
Mingrui Liu et.al. |
2601.12301 |
null |
| 2026-01-17 |
EMoE: Eigenbasis-Guided Routing for Mixture-of-Experts |
Anzhe Cheng et.al. |
2601.12137 |
null |
| 2026-01-17 |
The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) III: The vertical structure of debris disks |
Brianna Zawadzki et.al. |
2601.12128 |
null |
| 2026-01-17 |
One-Shot Price Forecasting with Covariate-Guided Experts under Privacy Constraints |
Ren He et.al. |
2601.11977 |
null |
| 2026-01-16 |
The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) VII: Optically thick gas with broad CO gaussian local line profiles in the HD 121617 disc |
A. Brennan et.al. |
2601.11824 |
null |
| 2026-01-16 |
Self-Augmented Mixture-of-Experts for QoS Prediction |
Kecheng Cai et.al. |
2601.11036 |
null |
| 2026-01-16 |
RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions |
Tasneem Shaffee et.al. |
2601.10921 |
null |
| 2026-01-15 |
MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts |
Yuxuan Lou et.al. |
2601.10272 |
null |
| 2026-01-15 |
MMPG: MoE-based Adaptive Multi-Perspective Graph Fusion for Protein Representation Learning |
Yusong Wang et.al. |
2601.10157 |
null |
| 2026-01-14 |
Progressive Mixture-of-Experts with autoencoder routing for continual RANS turbulence modelling |
Haoyu Ji et.al. |
2601.09305 |
null |
| 2026-01-15 |
A.X K1 Technical Report |
Sung Jun Cheon et.al. |
2601.09200 |
null |
| 2026-01-14 |
WiFo-E: A Scalable Wireless Foundation Model for End-to-End FDD Precoding in Communication Networks |
Weibo Wen et.al. |
2601.09186 |
null |
| 2026-01-14 |
Horseshoe Mixtures-of-Experts (HS-MoE) |
Nick Polson et.al. |
2601.09043 |
null |
| 2026-01-13 |
LookAhead: The Optimal Non-decreasing Index Policy for a Time-Varying Holding Cost problem |
Keerthana Gurushankar et.al. |
2601.08960 |
null |
| 2026-01-13 |
MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm |
Bowen Zhou et.al. |
2601.08800 |
null |
| 2026-01-13 |
LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms |
Namhyun Kim et.al. |
2601.08780 |
null |
| 2026-01-13 |
M $^2$ FMoE: Multi-Resolution Multi-View Frequency Mixture-of-Experts for Extreme-Adaptive Time Series Forecasting |
Yaohui Huang et.al. |
2601.08631 |
null |
| 2026-01-13 |
Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance |
Jihang Li et.al. |
2601.08418 |
null |
| 2026-01-13 |
Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models |
Bo Wang et.al. |
2601.08383 |
null |
| 2026-01-13 |
Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints |
Seng Pei Liew et.al. |
2601.08215 |
null |
| 2026-01-12 |
Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation |
Yuxin Yang et.al. |
2601.07935 |
null |
| 2026-01-12 |
Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator |
Chaewon Heo et.al. |
2601.07698 |
null |
| 2026-01-12 |
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models |
Xin Cheng et.al. |
2601.07372 |
null |
| 2026-01-11 |
Solar Open Technical Report |
Sungrae Park et.al. |
2601.07022 |
null |
| 2026-01-11 |
Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems |
Qikai Xiao et.al. |
2601.06858 |
null |
| 2026-01-11 |
MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models |
Xin Ye et.al. |
2601.06857 |
null |
| 2026-01-11 |
MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation |
Bochao Sun et.al. |
2601.06829 |
null |
| 2026-01-11 |
SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute |
Bowen Shen et.al. |
2601.06790 |
null |
| 2026-01-10 |
Hellinger Multimodal Variational Autoencoders |
Huyen Khanh Vo et.al. |
2601.06572 |
null |
| 2026-01-10 |
Physics-guided foundation model for universal speckle removal in ultrathin multimode fiber imaging |
Xianrui Zeng et.al. |
2601.06448 |
null |
| 2026-01-09 |
Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning |
Nusrat Jahan Prottasha et.al. |
2601.06356 |
null |
| 2026-01-09 |
Reconstruction of atmospheric neutrinos in DUNE's horizontal-drift far-detector module |
DUNE Collaboration et.al. |
2601.05697 |
null |
| 2026-01-09 |
Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts |
Wei Zhou et.al. |
2601.05537 |
null |
| 2026-01-08 |
MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs |
Jiyuan Zhang et.al. |
2601.05296 |
null |
| 2026-01-08 |
MoE3D: A Mixture-of-Experts Module for 3D Reconstruction |
Zichen Wang et.al. |
2601.05208 |
null |
| 2026-01-08 |
FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts |
Yiji Zhao et.al. |
2601.05174 |
null |
| 2026-01-08 |
How to Set the Learning Rate for Large-Scale Pre-training? |
Yunhua Zhou et.al. |
2601.05049 |
null |
| 2026-01-08 |
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation |
Guanzhi Deng et.al. |
2601.04823 |
null |
| 2026-01-07 |
A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems |
Qi Wu et.al. |
2601.03992 |
null |
| 2026-01-07 |
Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures |
Ibrahim Delibasoglu et.al. |
2601.03889 |
null |
| 2026-01-07 |
Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts |
Ye Su et.al. |
2601.03577 |
null |
| 2026-01-07 |
CALM: Culturally Self-Aware Language Models |
Lingzhi Shen et.al. |
2601.03483 |
null |
| 2026-01-06 |
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models |
Yan Wang et.al. |
2601.03425 |
null |
| 2026-01-06 |
ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios |
Yihan Wei et.al. |
2601.03011 |
null |
| 2026-01-06 |
MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free |
Yishu Lei et.al. |
2601.02967 |
null |
| 2026-01-06 |
MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation |
Wenzhao Jiang et.al. |
2601.02943 |
null |
| 2026-01-06 |
MiMo-V2-Flash Technical Report |
Bangjun Xiao et.al. |
2601.02780 |
null |
| 2026-01-05 |
Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts |
Boxuan Lyu et.al. |
2601.02144 |
null |
| 2026-01-05 |
GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection |
Joongwon Chae et.al. |
2601.01856 |
null |
| 2026-01-05 |
K-EXAONE Technical Report |
Eunbi Choi et.al. |
2601.01739 |
null |
| 2026-01-05 |
Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications |
YuanLab. ai et.al. |
2601.01718 |
null |
| 2026-01-05 |
Varying-Coefficient Mixture of Experts Model |
Qicheng Zhao et.al. |
2601.01699 |
null |
| 2026-01-04 |
Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts |
Ruofeng Yang et.al. |
2601.01475 |
null |
| 2026-01-04 |
Making MoE based LLM inference resilient with Tarragon |
Songyu Zhang et.al. |
2601.01310 |
null |
| 2026-01-03 |
MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance |
Hamad Khan et.al. |
2601.01260 |
null |
| 2026-01-02 |
Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures |
Kabir Grover et.al. |
2601.00942 |
null |
| 2026-01-02 |
HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts |
Zihan Fang et.al. |
2601.00583 |
null |
| 2026-01-01 |
Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations |
Hyunjun Kim et.al. |
2601.00457 |
null |
| 2026-01-01 |
Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach |
Kohei Yoshikawa et.al. |
2601.00287 |
null |
| 2025-12-31 |
Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models |
Ákos Prucs et.al. |
2512.24776 |
null |
| 2026-01-01 |
Sufficient and Necessary Conditions for Eckart-Young like Result for Tubal Tensors |
Uria Mor et.al. |
2512.24405 |
null |
| 2025-12-30 |
Quantum Computing, Ising Formulation, and the Traveling Salesman Problem |
Omer Gurevich et.al. |
2512.24308 |
null |
| 2025-12-30 |
Training Report of TeleChat3-MoE |
Xinzhang Liu et.al. |
2512.24157 |
null |
| 2025-12-30 |
RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress |
Ruixuan Huang et.al. |
2512.23995 |
null |
| 2025-12-30 |
Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation |
Hualin Ye et.al. |
2512.23938 |
null |
| 2025-12-29 |
Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion |
Vladimer Khasia et.al. |
2512.23448 |
null |
| 2025-12-29 |
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss |
Ang Lv et.al. |
2512.23447 |
null |
| 2025-12-30 |
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection |
Xu Lin et.al. |
2512.23273 |
null |
| 2025-12-28 |
Trust Region Masking for Long-Horizon LLM Reinforcement Learning |
Yingru Li et.al. |
2512.23075 |
null |
| 2025-12-28 |
FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment |
Boyang Zhang et.al. |
2512.23070 |
null |
| 2025-12-28 |
Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware |
Alex Khalil et.al. |
2512.23029 |
null |
| 2025-12-28 |
Text-Routed Sparse Mixture-of-Experts Model with Explanation and Temporal Alignment for Multi-Modal Sentiment Analysis |
Dongning Rao et.al. |
2512.22741 |
null |
| 2025-12-27 |
Bright 4B: Scaling Hyperspherical Learning for Segmentation in 3D Brightfield Microscopy |
Amil Khan et.al. |
2512.22423 |
null |
| 2025-12-26 |
FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion |
Zhuoran Zhu et.al. |
2512.22036 |
null |
| 2025-12-26 |
SWE-RM: Execution-free Feedback For Software Engineering Agents |
KaShun Shum et.al. |
2512.21919 |
null |
| 2025-12-26 |
Accelerate Speculative Decoding with Sparse Computation in Verification |
Jikai Wang et.al. |
2512.21911 |
null |
| 2025-12-26 |
MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction |
Carolina Aparício et.al. |
2512.21897 |
null |
| 2025-12-25 |
Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction |
Zheng Yin et.al. |
2512.21707 |
null |
| 2025-12-25 |
Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism |
Xinglin Pan et.al. |
2512.21487 |
null |
| 2025-12-24 |
DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction |
Khondoker Mirazul Mumenin et.al. |
2512.21433 |
null |
| 2025-12-25 |
GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs |
Lichao Wu et.al. |
2512.21008 |
null |
| 2025-12-24 |
RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks |
Ningyuan Liu et.al. |
2512.20920 |
null |
| 2025-12-24 |
NVIDIA Nemotron 3: Efficient and Open Intelligence |
NVIDIA et.al. |
2512.20856 |
null |
| 2025-12-23 |
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning |
NVIDIA et.al. |
2512.20848 |
null |
| 2025-12-23 |
Defending against adversarial attacks using mixture of experts |
Mohammad Meymani et.al. |
2512.20821 |
null |
| 2025-12-23 |
MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts |
Alexandros Christoforos et.al. |
2512.20604 |
null |
| 2025-12-23 |
Branch Learning in MRI: More Data, More Models, More Training |
Yuyang Li et.al. |
2512.20330 |
null |
| 2025-12-23 |
Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity |
Yuxing Gan et.al. |
2512.20291 |
null |
| 2025-12-23 |
Degradation-Aware Metric Prompting for Hyperspectral Image Restoration |
Binfeng Wang et.al. |
2512.20251 |
null |
| 2025-12-23 |
AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model |
Sofian Chaybouti et.al. |
2512.20157 |
null |
| 2025-12-22 |
UCCL-EP: Portable Expert-Parallel Communication |
Ziming Mao et.al. |
2512.19849 |
null |
| 2025-12-22 |
Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios |
Jiawen Wang et.al. |
2512.19551 |
null |
| 2025-12-22 |
EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control |
Chao Yang et.al. |
2512.19043 |
null |
| 2025-12-21 |
Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation |
Guangtao Lyu et.al. |
2512.18804 |
null |
| 2025-12-21 |
Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts |
Linwei Qiu et.al. |
2512.18718 |
null |
| 2025-12-21 |
Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing |
Wentao Liu et.al. |
2512.18674 |
null |
| 2025-12-20 |
Secret mixtures of experts inside your LLM |
Enric Boix-Adsera et.al. |
2512.18452 |
null |
| 2025-12-20 |
MoE Pathfinder: Trajectory-driven Expert Pruning |
Xican Yang et.al. |
2512.18425 |
null |
| 2025-12-20 |
MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation |
Kaixing Yang et.al. |
2512.18181 |
null |
| 2025-12-19 |
MoE-TransMov: A Transformer-based Model for Next POI Prediction in Familiar & Unfamiliar Movements |
Ruichen Tan et.al. |
2512.17985 |
null |
| 2025-12-22 |
SCOPE: Sequential Causal Optimization of Process Interventions |
Jakob De Moor et.al. |
2512.17629 |
null |
| 2025-12-18 |
Bandwidth-Efficient Adaptive Mixture-of-Experts via Low-Rank Compensation |
Zhenyu Liu et.al. |
2512.17073 |
null |
| 2025-12-18 |
Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models |
Zhongpan Tang et.al. |
2512.16963 |
null |
| 2025-12-18 |
An Upper Bound on the M/M/k Queue With Deterministic Setup Times |
Jalani Williams et.al. |
2512.16854 |
null |
| 2025-12-18 |
Meta-RL Induces Exploration in Language Agents |
Yulun Jiang et.al. |
2512.16848 |
null |
| 2025-12-18 |
PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation |
Mengyuan Liu et.al. |
2512.16494 |
null |
| 2025-12-18 |
Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems |
En-Ming Huang et.al. |
2512.16473 |
null |
| 2025-12-18 |
Pretrained Battery Transformer (PBT): A battery life prediction foundation model |
Ruifeng Tan et.al. |
2512.16334 |
null |
| 2025-12-19 |
Sigma-MoE-Tiny Technical Report |
Qingguo Hu et.al. |
2512.16248 |
null |
| 2025-12-18 |
INTELLECT-3: Technical Report |
Prime Intellect Team et.al. |
2512.16144 |
null |
| 2025-12-18 |
Let the Barbarians In: How AI Can Accelerate Systems Performance Research |
Audrey Cheng et.al. |
2512.14806 |
null |
| 2025-12-15 |
SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning |
Tomohito Kawabata et.al. |
2512.14757 |
null |
| 2025-12-16 |
SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing |
Han Zou et.al. |
2512.14140 |
null |
| 2025-12-16 |
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations |
Wentao Guo et.al. |
2512.14080 |
null |
| 2025-12-16 |
Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training |
Can Jin et.al. |
2512.13996 |
null |
| 2025-12-13 |
RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing |
Yuhan Tang et.al. |
2512.13727 |
null |
| 2025-12-15 |
StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion |
Guransh Singh et.al. |
2512.13632 |
null |
| 2025-12-16 |
Janus: Disaggregating Attention and Experts for Scalable MoE Inference |
Zhexiang Zhang et.al. |
2512.13525 |
null |
| 2025-12-15 |
Automated Information Flow Selection for Multi-scenario Multi-task Recommendation |
Chaohua Yang et.al. |
2512.13396 |
null |
| 2025-12-13 |
Fine-Grained Zero-Shot Learning with Attribute-Centric Representations |
Zhi Chen et.al. |
2512.12219 |
null |
| 2025-12-13 |
MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models |
Ahmad Chamma et.al. |
2512.12121 |
null |
| 2025-12-11 |
Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning |
Benjamin Gundersen et.al. |
2512.10691 |
null |
| 2025-12-11 |
Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration |
Wenlong Jiao et.al. |
2512.10581 |
null |
| 2025-12-11 |
Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment |
Han Li et.al. |
2512.10450 |
null |
| 2025-12-10 |
Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach |
Salvador Carrión et.al. |
2512.09910 |
null |
| 2025-12-10 |
DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation |
Zhizhong Wang et.al. |
2512.09814 |
null |
| 2025-12-10 |
M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks |
Blessed Guda et.al. |
2512.09797 |
null |
| 2025-12-10 |
FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model |
Xiang Chen et.al. |
2512.09282 |
null |
| 2025-12-10 |
Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens |
Yanpeng Yu et.al. |
2512.09277 |
null |
| 2025-12-09 |
Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts |
Yifan Lyu et.al. |
2512.08814 |
null |
| 2025-12-09 |
What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance |
Athena Psalta et.al. |
2512.08697 |
null |
| 2025-12-09 |
Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems |
Mingwei Li et.al. |
2512.08411 |
null |
| 2025-12-08 |
LongCat-Image Technical Report |
Meituan LongCat Team et.al. |
2512.07584 |
null |
| 2025-12-08 |
Search for Light Sterile Neutrinos With Two Neutrino Beams at MicroBooNE |
MicroBooNE collaboration et.al. |
2512.07159 |
null |
| 2025-12-09 |
TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning |
Zebin Xing et.al. |
2512.07135 |
null |
| 2025-12-08 |
PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes |
Kepeng Lin et.al. |
2512.07113 |
null |
| 2025-12-07 |
Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding |
MinCheol Jeon et.al. |
2512.06929 |
null |
| 2025-12-07 |
Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks |
Long Shi et.al. |
2512.06784 |
null |
| 2025-12-07 |
Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving |
Wei-Bin Kou et.al. |
2512.06664 |
null |
| 2025-12-06 |
Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion |
Jaewon Ahn et.al. |
2512.06449 |
null |
| 2025-12-04 |
The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation |
Ranjan Sapkota et.al. |
2512.06032 |
null |
| 2025-12-05 |
HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies |
Zhiying Du et.al. |
2512.05693 |
null |
| 2025-12-05 |
ProPhy: Progressive Physical Alignment for Dynamic World Simulation |
Zijun Wang et.al. |
2512.05564 |
null |
| 2025-12-05 |
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture |
Xin He et.al. |
2512.04810 |
null |
| 2025-12-04 |
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space |
Joey Hong et.al. |
2512.04601 |
null |
| 2025-12-04 |
Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems |
Zehao Fan et.al. |
2512.04476 |
null |
| 2025-12-03 |
Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research |
Zia Qi et.al. |
2512.04261 |
null |
| 2025-12-03 |
OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference |
Liujianfu Wang et.al. |
2512.03927 |
null |
| 2025-12-04 |
A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models |
X. Y. Han et.al. |
2512.03915 |
null |
| 2025-12-03 |
Parsimonious Clustering of Covariance Matrices |
Yixi Xu et.al. |
2512.03912 |
null |
| 2025-12-03 |
CellScout: Visual Analytics for Mining Biomarkers in Cell State Discovery |
Rui Sheng et.al. |
2512.03485 |
null |
| 2025-12-03 |
SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models |
Geoffrey J. McLachlan et.al. |
2512.03322 |
null |
| 2025-12-02 |
SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts |
Jiaqi Liu et.al. |
2512.02517 |
null |
| 2025-12-02 |
Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention |
Wenyi Xiong et.al. |
2512.02368 |
null |
| 2025-12-02 |
Understanding and Harnessing Sparsity in Unified Multimodal Models |
Shwai He et.al. |
2512.02351 |
null |
| 2025-12-01 |
Towards Unified Video Quality Assessment |
Chen Feng et.al. |
2512.02224 |
null |
| 2025-12-01 |
ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation |
Chenyang Gu et.al. |
2512.02013 |
null |
| 2025-12-01 |
Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks |
Kai Zhang et.al. |
2512.01750 |
null |
| 2025-12-01 |
GRASP: Guided Residual Adapters with Sample-wise Partitioning |
Felix Nützel et.al. |
2512.01675 |
null |
| 2025-12-01 |
Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery |
Zhicheng Zhao et.al. |
2512.01665 |
null |
| 2025-12-01 |
Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios |
Yiqiao Chen et.al. |
2512.01653 |
null |
| 2025-12-02 |
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices |
Chujie Zheng et.al. |
2512.01374 |
null |
| 2025-12-01 |
Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe |
Yahui Liu et.al. |
2512.01252 |
null |
| 2025-11-30 |
Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning |
Yebo Wu et.al. |
2512.00902 |
null |
| 2025-11-30 |
Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking |
Lingling Fu et.al. |
2512.00724 |
null |
| 2025-11-29 |
GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding |
Yiqiao Chen et.al. |
2512.00574 |
null |
| 2025-11-28 |
Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model |
Junshu Tang et.al. |
2511.23429 |
null |
| 2025-11-28 |
LFM2 Technical Report |
Alexander Amini et.al. |
2511.23404 |
null |
| 2025-11-28 |
Chart2Code-MoLA: Efficient Multi-Modal Code Generation via Adaptive Expert Routing |
Yifei Wang et.al. |
2511.23321 |
null |
| 2025-11-28 |
Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering |
Zijian Fu et.al. |
2511.23304 |
null |
| 2025-11-28 |
Experts are all you need: A Composable Framework for Large Language Model Inference |
Shrihari Sridharan et.al. |
2511.22955 |
null |
| 2025-11-28 |
EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model |
Yuhao Xu et.al. |
2511.22935 |
null |
| 2025-11-27 |
OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency |
Jun Wang et.al. |
2511.22481 |
null |
| 2025-11-27 |
Foundation Model for Intelligent Wireless Communications |
Boxun Liu et.al. |
2511.22222 |
null |
| 2025-11-27 |
MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding |
Yu Li et.al. |
2511.22103 |
null |
| 2025-11-27 |
Qwen3-VL Technical Report |
Shuai Bai et.al. |
2511.21631 |
null |
| 2025-11-26 |
MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training |
Lu Zhao et.al. |
2511.21431 |
null |
| 2025-11-26 |
MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts |
Ivan Novikov et.al. |
2511.21089 |
null |
| 2025-11-25 |
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation |
Xiang Wang et.al. |
2511.20520 |
null |
| 2025-11-25 |
MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology |
Kiril Vasilev et.al. |
2511.20490 |
null |
| 2025-11-25 |
Soft Adaptive Policy Optimization |
Chang Gao et.al. |
2511.20347 |
null |
| 2025-11-25 |
ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories |
Hai Ling et.al. |
2511.20169 |
null |
| 2025-11-25 |
Adaptive Knowledge Transfer for Cross-Disciplinary Cold-Start Knowledge Tracing |
Yulong Deng et.al. |
2511.20009 |
null |
| 2025-11-25 |
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models |
Wentao Hu et.al. |
2511.19822 |
null |
| 2025-11-22 |
Exploiting the Experts: Unauthorized Compression in MoE-LLMs |
Pinaki Prasad Guha Neogi et.al. |
2511.19480 |
null |
| 2025-11-24 |
OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs |
Yuting Gao et.al. |
2511.19023 |
null |
| 2025-11-24 |
Dynamic Mixture of Experts Against Severe Distribution Shifts |
Donghu Kim et.al. |
2511.18987 |
null |
| 2025-11-23 |
HiFi-MambaV2: Hierarchical Shared-Routed MoE for High-Fidelity MRI Reconstruction |
Pengcheng Fang et.al. |
2511.18534 |
null |
| 2025-11-23 |
Attosecond-resolved quantum fluctuations of light and matter |
Matan Even Tzur et.al. |
2511.18362 |
null |
| 2025-11-23 |
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert |
Yuting Gao et.al. |
2511.18314 |
null |
| 2025-11-22 |
PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures |
Yuheng Shao et.al. |
2511.18116 |
null |
| 2025-11-22 |
CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking |
Hao Li et.al. |
2511.17967 |
null |
| 2025-11-22 |
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning |
Guoyang Xia et.al. |
2511.17885 |
null |
| 2025-11-22 |
Equivalence of Context and Parameter Updates in Modern Transformer Blocks |
Adrian Goldwaser et.al. |
2511.17864 |
null |
| 2025-11-21 |
Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization |
Akhil Singampalli et.al. |
2511.17829 |
null |
| 2025-11-21 |
Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required? |
Sukwon Yun et.al. |
2511.17400 |
null |
| 2025-11-21 |
MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment |
Huangbiao Xu et.al. |
2511.17397 |
null |
| 2025-11-21 |
Measurements of differential charged-current cross sections on argon for electron neutrinos with final-state protons in MicroBooNE |
MicroBooNE collaboration et.al. |
2511.17342 |
null |
| 2025-11-21 |
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design |
Quentin Anthony et.al. |
2511.17127 |
null |
| 2025-11-21 |
VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions |
Qianyi Shao et.al. |
2511.16998 |
null |
| 2025-11-21 |
RadioKMoE: Knowledge-Guided Radiomap Estimation with Kolmogorov-Arnold Networks and Mixture-of-Experts |
Fupei Guo et.al. |
2511.16986 |
null |
| 2025-11-21 |
MicroMoE: Fine-Grained Load Balancing for Mixture-of-Experts with Token Scheduling |
Chenqi Zhao et.al. |
2511.16947 |
null |
| 2025-11-20 |
Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution |
Xiao He et.al. |
2511.16024 |
null |
| 2025-11-19 |
AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture |
Qiming Guo et.al. |
2511.15870 |
null |
| 2025-11-19 |
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping |
Yushi Huang et.al. |
2511.15690 |
null |
| 2025-11-19 |
VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation |
Tairan He et.al. |
2511.15200 |
null |
| 2025-11-19 |
GPU-Initiated Networking for NCCL |
Khaled Hamidouche et.al. |
2511.15076 |
null |
| 2025-11-19 |
WiCo-PG: Wireless Channel Foundation Model for Pathloss Map Generation via Synesthesia of Machines |
Mingran Sun et.al. |
2511.15030 |
null |
| 2025-11-19 |
WiCo-MG: Wireless Channel Foundation Model for Multipath Generation via Synesthesia of Machines |
Zengrui Han et.al. |
2511.15026 |
null |
| 2025-11-19 |
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference |
Kexin Chu et.al. |
2511.15015 |
null |
| 2025-11-18 |
HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation |
Lai Wei et.al. |
2511.14756 |
null |
| 2025-11-18 |
Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching |
Jintao Zhang et.al. |
2511.14488 |
null |
| 2025-11-18 |
MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts |
Wenfeng Wang et.al. |
2511.14102 |
null |
| 2025-11-18 |
FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration |
Jingren Liu et.al. |
2511.14099 |
null |
| 2025-11-18 |
SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts |
Fan Zhang et.al. |
2511.14093 |
null |
| 2025-11-17 |
MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis |
Peng Shu et.al. |
2511.13983 |
null |
| 2025-11-17 |
Introducing AI to an Online Petition Platform Changed Outputs but not Outcomes |
Isabel Corpus et.al. |
2511.13949 |
null |
| 2025-11-17 |
InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE |
Lipeng Wang et.al. |
2511.13488 |
null |
| 2025-11-17 |
Measurement of Exclusive $π^+$ --argon Interactions Using ProtoDUNE-SP |
DUNE Collaboration et.al. |
2511.13462 |
null |
| 2025-11-18 |
YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection |
Ori Meiraz et.al. |
2511.13344 |
null |
| 2025-11-17 |
Self-Adaptive Graph Mixture of Models |
Mohit Meena et.al. |
2511.13062 |
null |
| 2025-11-17 |
Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation |
Yu Hou et.al. |
2511.12922 |
null |
| 2025-11-16 |
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data |
Yunxin Li et.al. |
2511.12609 |
null |
| 2025-11-16 |
SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition |
Qing Cai et.al. |
2511.12559 |
null |
| 2025-11-16 |
MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics |
Jing Li et.al. |
2511.12525 |
null |
| 2025-11-16 |
MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding |
Zhanheng Nie et.al. |
2511.12449 |
null |
| 2025-11-15 |
SAC-MoE: Reinforcement Learning with Mixture-of-Experts for Control of Hybrid Dynamical Systems with Uncertainty |
Leroy D'Souza et.al. |
2511.12361 |
null |
| 2025-11-15 |
AMR-MoEGA: Antimicrobial Resistance Prediction using Mixture of Experts and Genetic Algorithms |
Anshul Bagaria et.al. |
2511.12223 |
null |
| 2025-11-15 |
ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction |
Ruochen Li et.al. |
2511.12214 |
null |
| 2025-11-14 |
First Measurement of $π^+$-Ar and $p$ -Ar Total Inelastic Cross Sections in the Sub-GeV Energy Regime with ProtoDUNE-SP Data |
DUNE Collaboration et.al. |
2511.11925 |
null |
| 2025-11-14 |
FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models |
Yonatan Dukler et.al. |
2511.11505 |
null |
| 2025-11-14 |
Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification |
Qinghao Gao et.al. |
2511.11460 |
null |
| 2025-11-14 |
Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing |
Cong Cao et.al. |
2511.11236 |
null |
| 2025-11-14 |
DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding |
Mingwei Xing et.al. |
2511.11232 |
null |
| 2025-11-14 |
ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization |
Anzhe Cheng et.al. |
2511.10971 |
null |
| 2025-11-14 |
Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go |
Yashshi Pipalani et.al. |
2511.10868 |
null |
| 2025-11-13 |
Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts |
Sumin Lee et.al. |
2511.10300 |
null |
| 2025-11-13 |
RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo |
Jueun Ko et.al. |
2511.10107 |
null |
| 2025-11-13 |
BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference |
Yun Wang et.al. |
2511.10054 |
null |
| 2025-11-13 |
ConSurv: Multimodal Continual Learning for Survival Analysis |
Dianzhi Yu et.al. |
2511.09853 |
null |
| 2025-11-12 |
UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving |
Ziyi Song et.al. |
2511.09013 |
null |
| 2025-11-12 |
Selective Sinkhorn Routing for Improved Sparse Mixture of Experts |
Duc Anh Nguyen et.al. |
2511.08972 |
null |
| 2025-11-12 |
Bayesian Mixture of Experts For Large Language Models |
Maryam Dialameh et.al. |
2511.08968 |
null |
| 2025-11-11 |
OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild |
Yuncheng Guo et.al. |
2511.08423 |
null |
| 2025-11-11 |
Text-based Aerial-Ground Person Retrieval |
Xinyu Zhou et.al. |
2511.08369 |
null |
| 2025-11-13 |
National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech -- The SpeechCARE Solution |
Maryam Zolnoori et.al. |
2511.08132 |
null |
| 2025-11-10 |
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs |
Zhongyang Li et.al. |
2511.07419 |
null |
| 2025-11-10 |
AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning |
Qile Jiang et.al. |
2511.07262 |
null |
| 2025-11-10 |
S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning |
Jiangwen Dong et.al. |
2511.06727 |
null |
| 2025-11-10 |
Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation |
Evelyn Chee et.al. |
2511.06723 |
null |
| 2025-11-09 |
Route Experts by Sequence, not by Token |
Tiansheng Wen et.al. |
2511.06494 |
null |
| 2025-11-09 |
HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation |
Kunrong Li et.al. |
2511.06388 |
null |
| 2025-11-09 |
A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images |
Ardhendu Sekhar et.al. |
2511.06266 |
null |
| 2025-11-08 |
DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities |
Nagur Shareef Shaik et.al. |
2511.05968 |
null |
| 2025-11-08 |
MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering |
Jian Zhu et.al. |
2511.05876 |
null |
| 2025-11-08 |
In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading |
Shuning Lin et.al. |
2511.05814 |
null |
| 2025-11-07 |
MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery |
Baiye Cheng et.al. |
2511.05007 |
null |
| 2025-11-06 |
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference |
Yushu Zhao et.al. |
2511.04805 |
null |
| 2025-11-06 |
GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization |
Mahmoud Soliman et.al. |
2511.04008 |
null |
| 2025-11-05 |
GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models |
Zhibin Wang et.al. |
2511.03251 |
null |
| 2025-11-04 |
RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains |
Tianle Pu et.al. |
2511.02331 |
null |
| 2025-11-04 |
FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error |
Fengjuan Wang et.al. |
2511.02302 |
null |
| 2025-11-04 |
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining |
Costin-Andrei Oncescu et.al. |
2511.02237 |
null |
| 2025-11-03 |
Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing |
Song Gao et.al. |
2511.01743 |
null |
| 2025-11-03 |
HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA |
Lei Hu et.al. |
2511.01463 |
null |
| 2025-11-04 |
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing |
Yifan Zhou et.al. |
2511.01197 |
null |
| 2025-11-03 |
DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection |
Guoxin Ma et.al. |
2511.01192 |
null |
| 2025-11-01 |
OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback |
Kai Luo et.al. |
2511.00510 |
null |
| 2025-10-31 |
LongCat-Flash-Omni Technical Report |
Meituan LongCat Team et.al. |
2511.00279 |
null |
| 2025-10-31 |
Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals |
Xiangyu Fan et.al. |
2510.27684 |
null |
| 2025-10-31 |
RDMA Point-to-Point Communication for LLM Systems |
Nandor Licker et.al. |
2510.27656 |
null |
| 2025-10-31 |
MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts |
Jingnan Gao et.al. |
2510.27234 |
null |
| 2025-10-31 |
AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification |
Yuanhao Tang et.al. |
2510.27155 |
null |
| 2025-10-30 |
Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement |
Aaditya Shukla et.al. |
2510.27051 |
null |
| 2025-10-30 |
Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems |
Hongbo Li et.al. |
2510.27004 |
null |
| 2025-10-30 |
MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation |
Arghavan Rezvani et.al. |
2510.26996 |
null |
| 2025-10-30 |
ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference |
Zixu Shen et.al. |
2510.26730 |
null |
| 2025-10-30 |
Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications |
Chuang Zhang et.al. |
2510.26628 |
null |
| 2025-10-30 |
MossNet: Mixture of State-Space Experts is a Multi-Head Attention |
Shikhar Tuli et.al. |
2510.26182 |
null |
| 2025-10-29 |
Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis |
Hyeonjun Lee et.al. |
2510.26014 |
null |
| 2025-10-31 |
Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training |
Hong Wang et.al. |
2510.25803 |
null |
| 2025-10-29 |
Revisiting scalable sequential recommendation with Multi-Embedding Approach and Mixture-of-Experts |
Qiushi Pan et.al. |
2510.25285 |
null |
| 2025-10-29 |
MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference |
Xinru Tang et.al. |
2510.25258 |
null |
| 2025-10-29 |
H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts |
Peilin Tan et.al. |
2510.25091 |
null |
| 2025-10-28 |
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation |
Inclusion AI et.al. |
2510.24821 |
null |
| 2025-10-28 |
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance |
Yujie Wei et.al. |
2510.24711 |
null |
| 2025-10-28 |
Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation |
Xiucheng Zhang et.al. |
2510.24055 |
null |
| 2025-10-26 |
Sparsity and Superposition in Mixture of Experts |
Marmik Chaudhari et.al. |
2510.23671 |
null |
| 2025-10-27 |
EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting |
Musleh Alharthi et.al. |
2510.23396 |
null |
| 2025-10-27 |
Rethinking GSPO: The Perplexity-Entropy Equivalence |
Chi Liu et.al. |
2510.23142 |
null |
| 2025-10-27 |
Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts |
Di Zhang et.al. |
2510.23027 |
null |
| 2025-10-27 |
MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning |
Han Wu et.al. |
2510.23013 |
null |
| 2025-10-25 |
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation |
Ling-Team et.al. |
2510.22115 |
null |
| 2025-10-24 |
PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling |
Andrea Bonfanti et.al. |
2510.21262 |
null |
| 2025-10-24 |
Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization |
Yunlong Chu et.al. |
2510.21207 |
null |
| 2025-10-24 |
Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts |
Yanguang Sun et.al. |
2510.21114 |
null |
| 2025-10-24 |
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning |
Siyong Chen et.al. |
2510.21093 |
null |
| 2025-10-23 |
Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts |
Mariona Jaramillo-Civill et.al. |
2510.20666 |
null |
| 2025-10-23 |
xTime: Extreme Event Prediction with Hierarchical Knowledge Distillation and Expert Fusion |
Quan Li et.al. |
2510.20651 |
null |
| 2025-10-23 |
Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning |
Xiaohan Lan et.al. |
2510.20519 |
null |
| 2025-10-23 |
A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization |
LinFeng Li et.al. |
2510.20291 |
null |
| 2025-10-23 |
AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training |
Huawei Bai et.al. |
2510.20111 |
null |
| 2025-10-22 |
HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission |
Weihao Yang et.al. |
2510.19470 |
null |
| 2025-10-22 |
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs |
Xinfeng Xia et.al. |
2510.19366 |
null |
| 2025-10-22 |
Modeling Turn-Taking with Semantically Informed Gestures |
Varsha Suresh et.al. |
2510.19350 |
null |
| 2025-10-23 |
RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training |
Heng Xu et.al. |
2510.19262 |
null |
| 2025-10-22 |
A Design Science Blueprint for an Orchestrated AI Assistant in Doctoral Supervision |
Teo Susnjak et.al. |
2510.19227 |
null |
| 2025-10-22 |
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting |
In-Hwan Jin et.al. |
2510.19210 |
null |
| 2025-10-21 |
Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework |
Yujie Xing et.al. |
2510.18825 |
null |
| 2025-10-21 |
Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification |
Bin Gu et.al. |
2510.18533 |
null |
| 2025-10-21 |
Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study |
Gangda Deng et.al. |
2510.18370 |
null |
| 2025-10-19 |
L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts |
Shihao Ji et.al. |
2510.17898 |
null |
| 2025-10-20 |
Towards 3D Objectness Learning in an Open World |
Taichi Liu et.al. |
2510.17686 |
null |
| 2025-10-20 |
Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model |
Xinwei Zhang et.al. |
2510.17684 |
null |
| 2025-10-20 |
Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm |
Hao Qiao et.al. |
2510.17604 |
null |
| 2025-10-20 |
ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts |
Zheyue Tan et.al. |
2510.17483 |
null |
| 2025-10-19 |
End-to-end Listen, Look, Speak and Act |
Siyin Wang et.al. |
2510.16756 |
null |
| 2025-10-18 |
NeurIPT: Foundation Model for Neural Interfaces |
Zitao Fang et.al. |
2510.16548 |
null |
| 2025-10-18 |
Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts |
Yongxiang Hua et.al. |
2510.16448 |
null |
| 2025-10-18 |
Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures |
Minh-Khoi Nguyen-Nhat et.al. |
2510.16411 |
null |
| 2025-10-17 |
Expert Merging in Sparse Mixture of Experts with Nash Bargaining |
Dung V. Nguyen et.al. |
2510.16138 |
null |
| 2025-10-17 |
Mixture of Experts Approaches in Dense Retrieval Tasks |
Effrosyni Sokli et.al. |
2510.15683 |
null |
| 2025-10-17 |
FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification |
Zhen Sun et.al. |
2510.15595 |
null |
| 2025-10-17 |
Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks |
Yuyuan Feng et.al. |
2510.15333 |
null |
| 2025-10-17 |
MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation |
Xianyang Qi et.al. |
2510.15286 |
null |
| 2025-10-17 |
Adaptive Individual Uncertainty under Out-Of-Distribution Shift with Expert-Routed Conformal Prediction |
Amitesh Badkul et.al. |
2510.15233 |
null |
| 2025-10-16 |
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models |
Guinan Su et.al. |
2510.14853 |
null |
| 2025-10-16 |
MergeMoE: Efficient Compression of MoE Models via Expert Output Merging |
Ruijie Miao et.al. |
2510.14436 |
null |
| 2025-10-16 |
Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning |
Weijie Shen et.al. |
2510.14300 |
null |
| 2025-10-16 |
MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering |
Mingkai Liu et.al. |
2510.14251 |
null |
| 2025-10-15 |
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression |
Mike Lasby et.al. |
2510.13999 |
null |
| 2025-10-15 |
Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module |
Ruitao Feng et.al. |
2510.13558 |
null |
| 2025-10-15 |
ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition |
Deeptimaan Banerjee et.al. |
2510.13493 |
null |
| 2025-10-15 |
Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers |
Xin Zhao et.al. |
2510.13462 |
null |
| 2025-10-15 |
Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts |
Li Bai et.al. |
2510.13451 |
null |
| 2025-10-15 |
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE |
Zhenyu Liu et.al. |
2510.13344 |
null |
| 2025-10-15 |
GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models |
Chen Zheng et.al. |
2510.13079 |
null |
| 2025-10-14 |
Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps |
Do Tien Hai et.al. |
2510.12744 |
null |
| 2025-10-14 |
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts |
Yushu Zhao et.al. |
2510.12357 |
null |
| 2025-10-14 |
DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification |
Tao Xie et.al. |
2510.12214 |
null |
| 2025-10-13 |
Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View |
Jinyu Zhang et.al. |
2510.11687 |
null |
| 2025-10-13 |
Robust Ego-Exo Correspondence with Long-Term Memory |
Yijun Hu et.al. |
2510.11417 |
null |
| 2025-10-13 |
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers |
Wenhan Ma et.al. |
2510.11370 |
null |
| 2025-10-13 |
What to expect from microscopic nuclear modelling for k $_{\rm eff}$ calculations ? |
D. Rochman et.al. |
2510.11256 |
null |
| 2025-10-13 |
MC#: Mixture Compressor for Mixture-of-Experts Large Models |
Wei Huang et.al. |
2510.10962 |
null |
| 2025-10-12 |
Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation |
Ali Atiah Alzahrani et.al. |
2510.10807 |
null |
| 2025-10-12 |
Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection |
Shizhen Zhao et.al. |
2510.10584 |
null |
| 2025-10-12 |
Hierarchical LoRA MoE for Efficient CTR Model Scaling |
Zhichen Zeng et.al. |
2510.10432 |
null |
| 2025-10-11 |
SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference |
Liangkun Chen et.al. |
2510.10302 |
null |
| 2025-10-10 |
MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest |
Xiao Yang et.al. |
2510.09857 |
null |
| 2025-10-10 |
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation |
Youwei Zheng et.al. |
2510.09094 |
null |
| 2025-10-09 |
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution |
Xiaohui Li et.al. |
2510.08771 |
null |
| 2025-10-09 |
FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts |
Heming Zou et.al. |
2510.08396 |
null |
| 2025-10-09 |
Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization |
Jason Bohne et.al. |
2510.08256 |
null |
| 2025-10-09 |
From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill |
Gunjun Lee et.al. |
2510.08055 |
null |
| 2025-10-09 |
Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training |
Ruizhe Wang et.al. |
2510.08008 |
null |
| 2025-10-09 |
Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing |
Cunli Mao et.al. |
2510.07736 |
null |
| 2025-10-09 |
Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision |
Xiaoxu Ma et.al. |
2510.07703 |
null |
| 2025-10-09 |
LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning |
Yuhan Sun et.al. |
2510.07685 |
null |
| 2025-10-08 |
MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting |
Yoli Shavit et.al. |
2510.07459 |
null |
| 2025-10-08 |
Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting |
Walid Guettala et.al. |
2510.07426 |
null |
| 2025-10-08 |
Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts |
Fangshuo Liao et.al. |
2510.07205 |
null |
| 2025-10-08 |
A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages |
Zibo Su et.al. |
2510.06612 |
null |
| 2025-10-09 |
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation |
Shuang Cheng et.al. |
2510.06303 |
null |
| 2025-10-06 |
Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation" |
Ranjan Mishra et.al. |
2510.06275 |
null |
| 2025-10-08 |
Barbarians at the Gate: How AI is Upending Systems Research |
Audrey Cheng et.al. |
2510.06189 |
null |
| 2025-10-07 |
Rasterized Steered Mixture of Experts for Efficient 2D Image Regression |
Yi-Hsin Li et.al. |
2510.05814 |
null |
| 2025-10-07 |
MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition |
Haoxun Li et.al. |
2510.05749 |
null |
| 2025-10-07 |
Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting |
Zhongkai Yu et.al. |
2510.05497 |
null |
| 2025-10-06 |
Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving |
Yue Pan et.al. |
2510.05245 |
null |
| 2025-10-06 |
REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis |
Alec K. Peltekian et.al. |
2510.04923 |
null |
| 2025-10-06 |
LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0 |
Jinbo Wen et.al. |
2510.04765 |
null |
| 2025-10-06 |
Multilingual Routing in Mixture-of-Experts |
Lucas Bandarkar et.al. |
2510.04694 |
null |
| 2025-10-06 |
Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing |
Xuanhua Yin et.al. |
2510.04670 |
null |
| 2025-10-05 |
HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks |
Nghiem T. Diep et.al. |
2510.04295 |
null |
| 2025-10-05 |
SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling |
Harshil Vejendla et.al. |
2510.04286 |
null |
| 2025-10-05 |
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition |
Umberto Cappellazzo et.al. |
2510.04136 |
null |
| 2025-10-03 |
Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective |
Yehuda Dar et.al. |
2510.03151 |
null |
| 2025-10-02 |
ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models |
Gursimran Singh et.al. |
2510.02613 |
null |
| 2025-10-02 |
UpSafe $^\circ$ C: Upcycling for Controllable Safety in Large Language Models |
Yuhao Sun et.al. |
2510.02194 |
null |
| 2025-10-02 |
LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition |
Rixin Zhou et.al. |
2510.01651 |
null |
| 2025-10-01 |
Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs |
Leyla Mirvakhabova et.al. |
2510.01185 |
null |
| 2025-10-01 |
Learning Compact Representations of LLM Abilities via Item Response Theory |
Jianhao Chen et.al. |
2510.00844 |
null |
| 2025-10-01 |
Graph Integrated Multimodal Concept Bottleneck Model |
Jiakai Lin et.al. |
2510.00701 |
null |
| 2025-10-01 |
FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression |
Yifei Gao et.al. |
2510.00621 |
null |
| 2025-10-01 |
Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning |
Minghao Yang et.al. |
2510.00570 |
null |
| 2025-09-30 |
FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training |
Yunqi Gao et.al. |
2510.00207 |
null |
| 2025-09-30 |
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization |
Yaoxiang Wang et.al. |
2509.26520 |
null |
| 2025-09-30 |
Nephrobase Cell+: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology |
Chenyu Li et.al. |
2509.26223 |
null |
| 2025-09-30 |
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline |
Haiyang Li et.al. |
2509.25991 |
null |
| 2025-09-30 |
UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression |
Yuan Zhao et.al. |
2509.25934 |
null |
| 2025-09-30 |
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel |
Chuanyang Zheng et.al. |
2509.25913 |
null |
| 2025-10-01 |
A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI |
Arvind Murari Vepa et.al. |
2509.25889 |
null |
| 2025-09-30 |
Collaborative Compression for Large-Scale MoE Deployment on Edge |
Yixiao Chen et.al. |
2509.25689 |
null |
| 2025-09-30 |
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts |
Yuan Zhuang et.al. |
2509.25684 |
null |
| 2025-09-30 |
Guiding Mixture-of-Experts with Temporal Multimodal Interactions |
Xing Han et.al. |
2509.25678 |
null |
| 2025-09-29 |
K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model |
Bangwei Guo et.al. |
2509.25594 |
null |
| 2025-09-29 |
GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference |
Yu Han et.al. |
2509.25041 |
null |
| 2025-09-29 |
LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection |
Bao-Ngoc Dao et.al. |
2509.24547 |
null |
| 2025-09-29 |
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning |
Minh Le et.al. |
2509.24483 |
null |
| 2025-09-29 |
Muon: Training and Trade-offs with Latent Attention and MoE |
Sushant Mehta et.al. |
2509.24406 |
null |
| 2025-09-29 |
LLaDA-MoE: A Sparse MoE Diffusion Language Model |
Fengqi Zhu et.al. |
2509.24389 |
null |
| 2025-09-29 |
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning |
Zhisheng Chen et.al. |
2509.24222 |
null |
| 2025-09-28 |
HunyuanImage 3.0 Technical Report |
Siyu Cao et.al. |
2509.23951 |
null |
| 2025-09-28 |
Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms |
Jiahao Ying et.al. |
2509.23933 |
null |
| 2025-09-28 |
Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know |
Albus Yizhuo Li et.al. |
2509.23830 |
null |
| 2025-09-28 |
A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning |
Yaya Zhao et.al. |
2509.23772 |
null |
| 2025-09-26 |
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time |
Yixuan Han et.al. |
2509.22572 |
null |
| 2025-09-26 |
Learning to Ball: Composing Policies for Long-Horizon Basketball Moves |
Pei Xu et.al. |
2509.22442 |
null |
| 2025-09-26 |
Role-Aware Multi-modal federated learning system for detecting phishing webpages |
Bo Wang et.al. |
2509.22369 |
null |
| 2025-09-26 |
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space |
Ke Li et.al. |
2509.22299 |
null |
| 2025-09-26 |
Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics |
Xingjian Wu et.al. |
2509.22279 |
null |
| 2025-09-26 |
MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning |
Tao Wu et.al. |
2509.21953 |
null |
| 2025-09-26 |
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts |
Naibin Gu et.al. |
2509.21892 |
null |
| 2025-09-26 |
ChaosNexus: A Foundation Model for Universal Chaotic System Forecasting with Multi-scale Representations |
Chang Liu et.al. |
2509.21802 |
null |
| 2025-09-26 |
LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE |
Yu Shang et.al. |
2509.21790 |
null |
| 2025-09-25 |
Distributed Specialization: Rare-Token Neurons in Large Language Models |
Jing Liu et.al. |
2509.21163 |
null |
| 2025-09-26 |
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns |
Xuemiao Zhang et.al. |
2509.21124 |
null |
| 2025-09-25 |
Physics Informed Neural Networks for design optimisation of diamond particle detectors for charged particle fast-tracking at high luminosity hadron colliders |
Alessandro Bombini et.al. |
2509.21123 |
null |
| 2025-09-24 |
Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts in Transformer Architectures |
Sampurna Roy et.al. |
2509.20577 |
null |
| 2025-09-24 |
SHMoAReg: Spark Deformable Image Registration via Spatial Heterogeneous Mixture of Experts and Attention Heads |
Yuxi Zheng et.al. |
2509.20073 |
null |
| 2025-09-24 |
Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference |
Ziyi Han et.al. |
2509.19781 |
null |
| 2025-09-23 |
DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces |
Tianshuo Zhang et.al. |
2509.19230 |
null |
| 2025-09-23 |
Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation |
Yunzhe Shen et.al. |
2509.18912 |
null |
| 2025-09-23 |
LongCat-Flash-Thinking Technical Report |
Meituan LongCat Team et.al. |
2509.18883 |
null |
| 2025-09-23 |
PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving |
Chengran Yuan et.al. |
2509.18609 |
null |
| 2025-09-23 |
Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts |
Qi Wang et.al. |
2509.18542 |
null |
| 2025-09-23 |
StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models |
Haoxin Yang et.al. |
2509.17993 |
null |
| 2025-09-23 |
Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark |
Siu Hang Ho et.al. |
2509.17894 |
null |
| 2025-09-22 |
Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving |
Ziming Liu et.al. |
2509.17863 |
null |
| 2025-09-22 |
Attention-based Mixture of Experts for Robust Speech Deepfake Detection |
Viola Negroni et.al. |
2509.17585 |
null |
| 2025-09-22 |
Robust Mixture Models for Algorithmic Fairness Under Latent Heterogeneity |
Siqi Li et.al. |
2509.17411 |
null |
| 2025-09-21 |
MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE |
Soheil Zibakhsh et.al. |
2509.17238 |
null |
| 2025-09-21 |
CoBEVMoE: Heterogeneity-aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception |
Lingzhao Kong et.al. |
2509.17107 |
null |
| 2025-09-21 |
Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation |
Junzhuo Li et.al. |
2509.16882 |
null |
| 2025-09-20 |
KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control |
Jinrui Han et.al. |
2509.16638 |
null |
| 2025-09-19 |
DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning |
Sikai Bai et.al. |
2509.16105 |
null |
| 2025-09-19 |
MoE-CE: Enhancing Generalization for Deep Learning based Channel Estimation via a Mixture-of-Experts Framework |
Tianyu Li et.al. |
2509.15964 |
null |
| 2025-09-19 |
pFedSAM: Personalized Federated Learning of Segment Anything Model for Medical Image Segmentation |
Tong Wang et.al. |
2509.15638 |
null |
| 2025-09-19 |
MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training |
Junbiao Pang et.al. |
2509.15514 |
null |
| 2025-09-18 |
Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing |
Zichen Wu et.al. |
2509.15361 |
null |
| 2025-09-18 |
Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting |
Liran Nochumsohn et.al. |
2509.15105 |
null |
| 2025-09-18 |
Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning |
Lei Wang et.al. |
2509.15087 |
null |
| 2025-09-18 |
EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence |
Chaoyin She et.al. |
2509.14977 |
null |
| 2025-09-18 |
FURINA: Free from Unmergeable Router via LINear Aggregation of mixed experts |
Jiayi Han et.al. |
2509.14900 |
null |
| 2025-09-18 |
CollabVLA: Self-Reflective Vision-Language-Action Model Dreaming Together with Human |
Nan Sun et.al. |
2509.14889 |
null |
| 2025-09-17 |
CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts |
Leonard Hackel et.al. |
2509.14104 |
null |
| 2025-09-18 |
SAIL-VL2 Technical Report |
Weijie Yin et.al. |
2509.14033 |
null |
| 2025-09-17 |
Semi-MoE: Mixture-of-Experts meets Semi-Supervised Histopathology Segmentation |
Nguyen Lan Vi Vu et.al. |
2509.13834 |
null |
| 2025-09-18 |
Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers |
Manan Mittal et.al. |
2509.13548 |
null |
| 2025-09-18 |
GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR |
Yujie Guo et.al. |
2509.13093 |
null |
| 2025-09-16 |
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection |
Boyu Han et.al. |
2509.12990 |
null |
| 2025-09-16 |
Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks |
Bowen Ye et.al. |
2509.12813 |
null |
| 2025-09-16 |
MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos |
Damola Agbelese et.al. |
2509.12772 |
null |
| 2025-09-17 |
NavMoE: Hybrid Model- and Learning-based Traversability Estimation for Local Navigation via Mixture of Experts |
Botao He et.al. |
2509.12747 |
null |
| 2025-09-16 |
AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models |
Heng Zhang et.al. |
2509.12715 |
null |
| 2025-10-24 |
Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts |
Yunke Qu et.al. |
2508.05993 |
null |
| 2025-07-23 |
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models |
Changxin Tian et.al. |
2507.17702 |
null |
| 2025-07-23 |
Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography |
Farnoush Bayatmakou et.al. |
2507.17662 |
null |
| 2025-07-23 |
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation |
Shuai Yang et.al. |
2507.17520 |
null |
| 2025-07-23 |
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection |
Yehao Lu et.al. |
2507.17436 |
null |
| 2025-07-23 |
A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model |
Zhe Xu et.al. |
2507.17303 |
null |
| 2025-07-23 |
BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs |
Jianmin Hu et.al. |
2507.17133 |
null |
| 2025-07-22 |
GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI |
Joshua Kalyanapu et.al. |
2507.17033 |
null |
| 2025-07-22 |
Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data |
Yunyi Shen et.al. |
2507.16817 |
null |
| 2025-07-22 |
Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training |
Zixiao Huang et.al. |
2507.16274 |
null |
| 2025-07-21 |
Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure |
Alexandra Junell et.al. |
2507.16088 |
null |
| 2025-07-21 |
Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation |
Alessandro B. Melchiorre et.al. |
2507.15826 |
null |
| 2025-07-21 |
The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts |
Sungmin Yun et.al. |
2507.15465 |
null |
| 2025-07-21 |
Universal crystal material property prediction via multi-view geometric fusion in graph transformers |
Liang Zhang et.al. |
2507.15303 |
null |
| 2025-07-20 |
CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning |
Pan Hu et.al. |
2507.14903 |
null |
| 2025-07-23 |
GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving |
Chi Wan et.al. |
2507.14456 |
null |
| 2025-07-18 |
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing |
Yingying Zhang et.al. |
2507.13812 |
null |
| 2025-07-17 |
Apple Intelligence Foundation Language Models: Tech Report 2025 |
Hanzhi Zhou et.al. |
2507.13575 |
null |
| 2025-07-17 |
R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning |
Xiaohan Guo et.al. |
2507.13107 |
null |
| 2025-07-16 |
Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series |
Martina Cádiz-Leyton et.al. |
2507.12611 |
null |
| 2025-07-16 |
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models |
Gen Luo et.al. |
2507.12566 |
null |
| 2025-07-17 |
Mixture of Raytraced Experts |
Andrea Perin et.al. |
2507.12419 |
null |
| 2025-07-16 |
CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning |
Peiwen Xia et.al. |
2507.11834 |
null |
| 2025-07-15 |
Mixture of Experts in Large Language Models |
Danyang Zhang et.al. |
2507.11181 |
null |
| 2025-07-15 |
Atmos-Bench: 3D Atmospheric Structures for Climate Insight |
Tianchi Xu et.al. |
2507.11085 |
null |
| 2025-07-14 |
DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models |
Luolin Xiong et.al. |
2507.09955 |
null |
| 2025-07-14 |
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization |
Huilai Li et.al. |
2507.09945 |
null |
| 2025-07-14 |
Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems |
Vindula Jayawardana et.al. |
2507.09836 |
null |
| 2025-07-13 |
Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts |
Aakash Tripathi et.al. |
2507.09754 |
null |
| 2025-07-13 |
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive |
You Huang et.al. |
2507.09612 |
null |
| 2025-07-12 |
PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process |
Shiqi Jiang et.al. |
2507.09242 |
null |
| 2025-07-11 |
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity |
Chenyang Song et.al. |
2507.08771 |
null |
| 2025-07-11 |
CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes |
Tianyou Jiang et.al. |
2507.08542 |
null |
| 2025-07-11 |
White-Basilisk: A Hybrid Model for Code Vulnerability Detection |
Ioannis Lamprou et.al. |
2507.08540 |
null |
| 2025-07-15 |
KAT-V1: Kwai-AutoThink Technical Report |
Zizheng Zhan et.al. |
2507.08297 |
null |
| 2025-07-11 |
Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization |
Woon Ryong Kim et.al. |
2507.08269 |
null |
| 2025-07-10 |
MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving |
Lu Xu et.al. |
2507.07818 |
null |
| 2025-07-10 |
When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance |
Peizhang Shao et.al. |
2507.07748 |
null |
| 2025-07-09 |
Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning |
Ankit Jyothish et.al. |
2507.07335 |
null |
| 2025-07-08 |
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate |
A. Bochkov et.al. |
2507.07129 |
null |
| 2025-07-09 |
4KAgent: Agentic Any Image to 4K Super-Resolution |
Yushen Zuo et.al. |
2507.07105 |
null |
| 2025-07-11 |
FlexOlmo: Open Language Models for Flexible Data Use |
Weijia Shi et.al. |
2507.07024 |
null |
| 2025-07-09 |
Deep Disentangled Representation Network for Treatment Effect Estimation |
Hui Meng et.al. |
2507.06650 |
null |
| 2025-07-09 |
SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference |
Qian Chen et.al. |
2507.06567 |
null |
| 2025-07-09 |
MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models |
Yiwen Liu et.al. |
2507.06502 |
null |
| 2025-07-08 |
Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation |
Szymon Płotka et.al. |
2507.06363 |
null |
| 2025-07-08 |
Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis |
Xintong Hu et.al. |
2507.06116 |
null |
| 2025-07-09 |
A Survey on Prompt Tuning |
Zongqian Li et.al. |
2507.06085 |
null |
| 2025-07-08 |
Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors |
Bing Wang et.al. |
2507.05939 |
null |
| 2025-07-08 |
What You Have is What You Track: Adaptive and Robust Multimodal Tracking |
Yuedong Tan et.al. |
2507.05899 |
null |
| 2025-07-08 |
Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition |
Zijin Gu et.al. |
2507.05724 |
null |
| 2025-07-08 |
Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach |
Xiaobing Chen et.al. |
2507.05685 |
null |
| 2025-07-08 |
City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data |
Tianxing Wu et.al. |
2507.05651 |
null |
| 2025-07-07 |
QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks |
Hoang-Quan Nguyen et.al. |
2507.05190 |
null |
| 2025-07-07 |
NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification |
Jun Hu et.al. |
2507.04870 |
null |
| 2025-07-07 |
DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics |
Yayu Long et.al. |
2507.04661 |
null |
| 2025-07-08 |
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification |
Xixi Wan et.al. |
2507.04638 |
null |
| 2025-07-07 |
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts |
Yun Wang et.al. |
2507.04631 |
null |
| 2025-07-05 |
Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge |
Linshen Liu et.al. |
2507.04123 |
null |
| 2025-07-05 |
From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM |
Xinyi Wu et.al. |
2507.03868 |
null |
| 2025-07-04 |
Decoupled Relative Learning Rate Schedules |
Jan Ludziejewski et.al. |
2507.03526 |
null |
| 2025-07-03 |
Neural Inhibition Improves Dynamic Routing and Mixture of Experts |
Will Y. Zou et.al. |
2507.03221 |
null |
| 2025-07-03 |
System-performance and cost modeling of Large Language Model training and inference |
Wenzhe Guo et.al. |
2507.02456 |
null |
| 2025-07-03 |
NLP4Neuro: Sequence-to-sequence learning for neural population decoding |
Jacob J. Morra et.al. |
2507.02264 |
null |
| 2025-07-02 |
MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics |
Dmytro Kuzmenko et.al. |
2507.01843 |
null |
| 2025-07-02 |
Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data |
Ethan Pawl et.al. |
2507.01375 |
null |
| 2025-07-02 |
Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model |
Chaoxiang Cai et.al. |
2507.01351 |
null |
| 2025-07-02 |
Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations |
Bohao Wang et.al. |
2507.01337 |
null |
| 2025-07-02 |
ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation |
JianChao Zhao et.al. |
2507.00502 |
null |
| 2025-07-01 |
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE |
Geng Zhang et.al. |
2507.00390 |
null |
| 2025-06-30 |
MotionGPT3: Human Motion as a Second Modality |
Bingfan Zhu et.al. |
2506.24086 |
null |
| 2025-06-30 |
MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis |
Zhe Liu et.al. |
2506.23648 |
null |
| 2025-06-30 |
Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model |
Mu-Chi Chen et.al. |
2506.23635 |
null |
| 2025-06-29 |
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging |
Lujun Li et.al. |
2506.23266 |
null |
| 2025-06-29 |
External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting |
Haoran Li et.al. |
2506.23201 |
null |
| 2025-06-29 |
Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound |
Zhiyuan Zhu et.al. |
2506.23108 |
null |
| 2025-07-01 |
Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning |
Sanskar Pandey et.al. |
2506.22919 |
null |
| 2025-06-27 |
Towards Distributed Neural Architectures |
Aditya Cowsik et.al. |
2506.22389 |
null |
| 2025-06-27 |
MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism |
Zheng Zhang et.al. |
2506.22175 |
null |
| 2025-06-27 |
DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE |
Hang Shao et.al. |
2506.21864 |
null |
| 2025-06-26 |
Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts |
Jiajie Yang et.al. |
2506.21328 |
null |
| 2025-06-26 |
Learning to Skip the Middle Layers of Transformers |
Tim Lawson et.al. |
2506.21103 |
null |
| 2025-06-26 |
Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning |
Haodong Lu et.al. |
2506.21035 |
null |
| 2025-06-26 |
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning |
Xiao Zhang et.al. |
2506.20986 |
null |
| 2025-06-25 |
Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration |
Jiaxing Huang et.al. |
2506.20282 |
null |
| 2025-06-23 |
Multimodal Anomaly Detection with a Mixture-of-Experts |
Christoph Willibald et.al. |
2506.19077 |
null |
| 2025-06-23 |
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models |
Zihan Wang et.al. |
2506.18945 |
null |
| 2025-06-23 |
Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning |
Rahul Atul Bhope et.al. |
2506.18789 |
null |
| 2025-06-23 |
An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify |
Shivam Verma et.al. |
2506.18735 |
null |
| 2025-06-23 |
Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks |
Xiaodong Wu et.al. |
2506.18543 |
null |
| 2025-06-23 |
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation |
Zichong Li et.al. |
2506.18349 |
null |
| 2025-06-23 |
Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies |
Junchao Fan et.al. |
2506.18304 |
null |
| 2025-06-22 |
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection |
Zheng Zhan et.al. |
2506.18145 |
null |
| 2025-06-21 |
Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert |
Gelei Xu et.al. |
2506.17787 |
null |
| 2025-06-21 |
Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities |
Xinghao Huang et.al. |
2506.17755 |
null |
| 2025-06-21 |
PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation |
Xinyu Xiong et.al. |
2506.17712 |
null |
| 2025-06-20 |
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification |
Zhenglin Lai et.al. |
2506.17368 |
null |
| 2025-06-19 |
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE |
Khiem Le et.al. |
2506.16600 |
null |
| 2025-06-19 |
Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models |
Daniel Fidel Harvey et.al. |
2506.16419 |
null |
| 2025-06-17 |
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models |
Jesmin Jahan Tithi et.al. |
2506.15006 |
null |
| 2025-06-17 |
NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification |
Wajih Hassan Raza et.al. |
2506.14970 |
null |
| 2025-06-17 |
GMT: General Motion Tracking for Humanoid Whole-Body Control |
Zixuan Chen et.al. |
2506.14770 |
null |
| 2025-06-17 |
Exploring Speaker Diarization with Mixture of Experts |
Gaobin Yang et.al. |
2506.14750 |
null |
| 2025-06-18 |
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs |
Ling Team et.al. |
2506.14731 |
null |
| 2025-06-17 |
GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors |
Hengyuan Zhang et.al. |
2506.14646 |
link |
| 2025-06-17 |
Single-Example Learning in a Mixture of GPDMs with Latent Geometries |
Jesse St. Amand et.al. |
2506.14563 |
null |
| 2025-06-17 |
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models |
Hongyu Wang et.al. |
2506.14435 |
null |
| 2025-06-16 |
Load Balancing Mixture of Experts with Similarity Preserving Routers |
Nabil Omi et.al. |
2506.14038 |
null |
| 2025-06-16 |
GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics |
Qianzhong Chen et.al. |
2506.14009 |
null |
| 2025-06-16 |
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention |
MiniMax et.al. |
2506.13585 |
link |
| 2025-06-16 |
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization |
Guanghui Song et.al. |
2506.13541 |
null |
| 2025-06-16 |
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization |
Zhongqian Fu et.al. |
2506.13329 |
link |
| 2025-06-16 |
Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs |
Xintong Tang et.al. |
2506.13192 |
null |
| 2025-06-15 |
Serving Large Language Models on Huawei CloudMatrix384 |
Pengfei Zuo et.al. |
2506.12708 |
null |
| 2025-06-14 |
Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts |
Shengzhuang Chen et.al. |
2506.12597 |
null |
| 2025-06-14 |
Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control |
Rongpeng Li et.al. |
2506.12453 |
null |
| 2025-06-17 |
HarMoEny: Efficient Multi-GPU Inference of MoE Models |
Zachary Doucet et.al. |
2506.12417 |
null |
| 2025-06-14 |
Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model |
Chong Li et.al. |
2506.12388 |
null |
| 2025-06-13 |
Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? |
Houyi Li et.al. |
2506.12119 |
null |
| 2025-06-13 |
Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution |
Zhangkai Ni et.al. |
2506.11823 |
link |
| 2025-06-12 |
Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts |
Zaijing Li et.al. |
2506.10357 |
null |
| 2025-06-11 |
GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture |
GigaChat team et.al. |
2506.09440 |
null |
| 2025-06-11 |
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts |
Yuchen Feng et.al. |
2506.09351 |
null |
| 2025-06-10 |
CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA |
Jiale Dong et.al. |
2506.08496 |
link |
| 2025-06-11 |
MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding |
Shivang Chopra et.al. |
2506.08356 |
null |
| 2025-06-11 |
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation |
Yiming Wang et.al. |
2506.08054 |
link |
| 2025-06-09 |
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling |
Jacob Helwig et.al. |
2506.07969 |
link |
| 2025-06-09 |
M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration |
Yongzhen Wang et.al. |
2506.07814 |
null |
| 2025-06-11 |
MIRA: Medical Time Series Foundation Model for Real-World Health Data |
Hao Li et.al. |
2506.07584 |
null |
| 2025-06-11 |
MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization |
Ken Yaggel et.al. |
2506.07563 |
link |
| 2025-06-09 |
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts |
Wei Tao et.al. |
2506.07533 |
null |
| 2025-06-09 |
MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing |
Haiyue Ma et.al. |
2506.07366 |
null |
| 2025-06-08 |
UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment |
Wentao Zhao et.al. |
2506.07013 |
null |
| 2025-06-07 |
High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations |
Ziwei Li et.al. |
2506.06858 |
null |
| 2025-06-07 |
Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning |
Yuan Yuan et.al. |
2506.06694 |
null |
| 2025-06-06 |
Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization |
Jonathan Yang et.al. |
2506.06196 |
null |
| 2025-06-06 |
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models |
Jie Cao et.al. |
2506.05928 |
null |
| 2025-06-06 |
dots.llm1 Technical Report |
Bi Huo et.al. |
2506.05767 |
null |
| 2025-06-05 |
Mixture-of-Experts Meets In-Context Reinforcement Learning |
Wenhao Wu et.al. |
2506.05426 |
null |
| 2025-06-05 |
Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection |
Ziyi Zhou et.al. |
2506.04739 |
null |
| 2025-06-05 |
FlashDMoE: Fast Distributed MoE in a Single Kernel |
Osayamen Jonathan Aimuyo et.al. |
2506.04667 |
link |
| 2025-06-04 |
Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts |
Jiaxing Zhang et.al. |
2506.03591 |
null |
| 2025-06-04 |
PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs |
Ze Yu Zhang et.al. |
2506.02965 |
null |
| 2025-06-03 |
Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights |
Jakub Krajewski et.al. |
2506.02890 |
null |
| 2025-06-03 |
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts |
Jack Cook et.al. |
2506.02813 |
null |
| 2025-06-04 |
MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection |
Juntong Li et.al. |
2506.02535 |
null |
| 2025-06-03 |
MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework |
Yupeng Qi et.al. |
2506.02460 |
null |
| 2025-05-31 |
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA |
Duzhen Zhang et.al. |
2506.02041 |
null |
| 2025-06-02 |
SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model |
Zhao Yang et.al. |
2506.01833 |
link |
| 2025-06-02 |
Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning |
Ryotaro Kawata et.al. |
2506.01656 |
null |
| 2025-06-02 |
DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models |
Jiancheng Ye et.al. |
2506.01257 |
null |
| 2025-06-01 |
Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts |
Fan Liu et.al. |
2506.00965 |
null |
| 2025-05-30 |
Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction |
Shuai Liu et.al. |
2505.24597 |
null |
| 2025-05-30 |
Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis |
Junzhuo Li et.al. |
2505.24593 |
null |
| 2025-05-30 |
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer |
Yilun Kong et.al. |
2505.24378 |
link |
| 2025-05-30 |
GradPower: Powering Gradients for Faster Language Model Pre-Training |
Mingze Wang et.al. |
2505.24275 |
null |
| 2025-05-30 |
On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks |
Mingze Wang et.al. |
2505.24205 |
null |
| 2025-05-29 |
Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts |
Xuweiyi Chen et.al. |
2505.23926 |
null |
| 2025-06-03 |
Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert |
Zhaokun Wang et.al. |
2505.23868 |
null |
| 2025-05-29 |
From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents |
Tobias Lindenbauer et.al. |
2505.23422 |
link |
| 2025-05-29 |
Context-Aware Semantic Communication for the Wireless Networks |
Guangyuan Liu et.al. |
2505.23249 |
null |
| 2025-05-29 |
Two Is Better Than One: Rotations Scale LoRAs |
Hongcan Guo et.al. |
2505.23184 |
null |
| 2025-05-28 |
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer |
Qi Cai et.al. |
2505.22705 |
link |
| 2025-05-28 |
Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts |
Xue Zhang et.al. |
2505.22582 |
null |
| 2025-05-28 |
A Human-Centric Approach to Explainable AI for Personalized Education |
Vinitra Swamy et.al. |
2505.22541 |
link |
| 2025-05-28 |
Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion |
Kewen Chen et.al. |
2505.22360 |
null |
| 2025-05-28 |
Advancing Expert Specialization for Better MoE |
Hongcan Guo et.al. |
2505.22323 |
null |
| 2025-05-28 |
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation |
Jiawen Yu et.al. |
2505.22159 |
null |
| 2025-05-28 |
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation |
Yan Rong et.al. |
2505.22053 |
null |
| 2025-05-28 |
Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge |
Zhongyi Zhou et.al. |
2505.21906 |
null |
| 2025-05-27 |
MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis |
Yitong Li et.al. |
2505.21698 |
null |
| 2025-05-28 |
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity |
Yehui Tang et.al. |
2505.21411 |
null |
| 2025-05-27 |
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities |
Junyan Zhang et.al. |
2505.21191 |
null |
| 2025-05-27 |
Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts |
Yue Zhang et.al. |
2505.21079 |
null |
| 2025-05-27 |
Multi-objective Large Language Model Alignment with Hierarchical Experts |
Zhuo Li et.al. |
2505.20925 |
null |
| 2025-05-26 |
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models |
Hao Kang et.al. |
2505.20225 |
link |
| 2025-05-26 |
NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID |
Shihao Li et.al. |
2505.20001 |
null |
| 2025-05-26 |
Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments |
Junming Liu et.al. |
2505.19699 |
null |
| 2025-05-26 |
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE |
Zongle Huang et.al. |
2505.19645 |
null |
| 2025-05-26 |
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate |
Liangwei Nathan Zheng et.al. |
2505.19525 |
link |
| 2025-05-26 |
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference |
Sihan Chen et.al. |
2505.19427 |
link |
| 2025-05-25 |
RankLLM: A Python Package for Reranking with LLMs |
Sahel Sharifymoghaddam et.al. |
2505.19284 |
null |
| 2025-05-25 |
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts |
Jiayi Xin et.al. |
2505.19190 |
link |
| 2025-05-24 |
TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling |
Chonghua Han et.al. |
2505.18670 |
null |
| 2025-05-24 |
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation |
Jian Liang et.al. |
2505.18640 |
link |
| 2025-05-24 |
Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter |
Weizhi Zhong et.al. |
2505.18612 |
null |
| 2025-05-23 |
Enhancing CTR Prediction with De-correlated Expert Networks |
Jiancheng Wang et.al. |
2505.17925 |
null |
| 2025-05-23 |
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval |
Zehua Pei et.al. |
2505.17639 |
null |
| 2025-05-23 |
CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning |
Jinyuan Feng et.al. |
2505.17553 |
null |
| 2025-05-23 |
MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation |
Kaixing Yang et.al. |
2505.17543 |
null |
| 2025-05-22 |
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model |
Qihao Duan et.al. |
2505.17257 |
null |
| 2025-05-22 |
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving |
Zhenjie Yang et.al. |
2505.16278 |
null |
| 2025-05-22 |
DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor |
Yan Zhao et.al. |
2505.16256 |
null |
| 2025-05-21 |
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models |
Jingcong Liang et.al. |
2505.16056 |
link |
| 2025-05-21 |
MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding |
Yuxiang Wei et.al. |
2505.15946 |
null |
| 2025-05-21 |
CoLA: Collaborative Low-Rank Adaptation |
Yiyun Zhou et.al. |
2505.15471 |
link |
| 2025-05-22 |
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought |
Tencent Hunyuan Team et.al. |
2505.15431 |
null |
| 2025-05-21 |
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks |
Uranik Berisha et.al. |
2505.15414 |
null |
| 2025-05-21 |
Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines |
Xiaohou Shi et.al. |
2505.15151 |
null |
| 2025-05-20 |
Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies |
Haoyi Qiu et.al. |
2505.14972 |
link |
| 2025-05-20 |
Balanced and Elastic End-to-end Training of Dynamic LLMs |
Mohamed Wahib et.al. |
2505.14864 |
null |
| 2025-05-20 |
Solving MNIST with a globally trained Mixture of Quantum Experts |
Paolo Alessandro Xavier Tognini et.al. |
2505.14789 |
null |
| 2025-05-20 |
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training |
Mengru Wang et.al. |
2505.14681 |
null |
| 2025-05-21 |
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach |
Umberto Cappellazzo et.al. |
2505.14336 |
null |
| 2025-05-20 |
FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation |
Shaolin Zhu et.al. |
2505.14256 |
null |
| 2025-05-20 |
THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation |
Yunlong Liang et.al. |
2505.14173 |
null |
| 2025-05-20 |
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition |
Shuo Zhang et.al. |
2505.14143 |
null |
| 2025-05-20 |
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging |
Ryo Bertolissi et.al. |
2505.14136 |
null |
| 2025-05-20 |
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning |
Huaijie Wang et.al. |
2505.13997 |
null |
| 2025-05-20 |
Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting |
Bao-Ngoc Dao et.al. |
2505.13944 |
link |
| 2025-05-20 |
U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding |
Ziqian Wang et.al. |
2505.13880 |
link |
| 2025-05-20 |
EfficientLLM: Efficiency in Large Language Models |
Zhengqing Yuan et.al. |
2505.13840 |
null |
| 2025-05-19 |
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition |
Nam V. Nguyen et.al. |
2505.13380 |
link |
| 2025-05-19 |
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference |
Shuqing Luo et.al. |
2505.13345 |
link |
| 2025-05-19 |
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models |
Lucas Berry et.al. |
2505.13273 |
null |
| 2025-05-19 |
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics |
Christoph Jürgen Hemmer et.al. |
2505.13192 |
null |
| 2025-05-19 |
Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures |
Tuan Thai et.al. |
2505.13052 |
null |
| 2025-05-18 |
Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization |
Hongbiao Zhu et.al. |
2505.12311 |
null |
| 2025-05-20 |
Model Merging in Pre-training of Large Language Models |
Yunshui Li et.al. |
2505.12082 |
null |
| 2025-05-20 |
Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition |
Runduo Han et.al. |
2505.12007 |
link |
| 2025-05-17 |
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging |
Zihuan Qiu et.al. |
2505.11883 |
null |
| 2025-05-17 |
Improving Coverage in Combined Prediction Sets with Weighted p-values |
Gina Wong et.al. |
2505.11785 |
null |
| 2025-05-16 |
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production |
Chao Jin et.al. |
2505.11432 |
null |
| 2025-05-16 |
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems |
Yinsicheng Jiang et.al. |
2505.11415 |
null |
| 2025-05-16 |
A Fast Kernel-based Conditional Independence test with Application to Causal Discovery |
Oliver Schacht et.al. |
2505.11085 |
null |
| 2025-05-16 |
On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating |
Huy Nguyen et.al. |
2505.10860 |
null |
| 2025-05-14 |
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning |
Zongqian Li et.al. |
2505.09519 |
link |
| 2025-05-14 |
Qwen3 Technical Report |
An Yang et.al. |
2505.09388 |
link |
| 2025-05-14 |
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures |
Chenggang Zhao et.al. |
2505.09343 |
null |
| 2025-05-13 |
Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony |
Shaoyu Wang et.al. |
2505.08944 |
null |
| 2025-05-13 |
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts |
Yang Su et.al. |
2505.08719 |
null |
| 2025-05-13 |
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale |
Yunjie Ji et.al. |
2505.08311 |
null |
| 2025-05-12 |
UMoE: Unifying Attention and FFN with Shared Experts |
Yuanhang Yang et.al. |
2505.07260 |
null |
| 2025-05-11 |
Seed1.5-VL Technical Report |
Dong Guo et.al. |
2505.07062 |
null |
| 2025-05-11 |
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers |
Tianyu Chen et.al. |
2505.06858 |
null |
| 2025-05-11 |
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts |
Enric Boix-Adsera et.al. |
2505.06839 |
null |
| 2025-05-10 |
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free |
Zihan Qiu et.al. |
2505.06708 |
link |
| 2025-05-10 |
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding |
Dawei Huang et.al. |
2505.06685 |
link |
| 2025-05-10 |
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration |
HamidReza Imani et.al. |
2505.06481 |
null |
| 2025-05-12 |
FloE: On-the-Fly MoE Inference on Memory-constrained GPU |
Yuxin Zhou et.al. |
2505.05950 |
null |
| 2025-05-09 |
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design |
Haojie Duanmu et.al. |
2505.05799 |
link |
| 2025-05-08 |
Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts |
Ming Li et.al. |
2505.05035 |
null |
| 2025-05-07 |
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs |
Yehui Tang et.al. |
2505.04519 |
null |
| 2025-05-07 |
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios |
Ning Cheng et.al. |
2505.04201 |
null |
| 2025-05-07 |
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress? |
Teddy Foley et.al. |
2505.04075 |
link |
| 2025-05-07 |
Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications |
Yuanai Xie et.al. |
2505.04068 |
null |
| 2025-05-06 |
Towards Smart Point-and-Shoot Photography |
Jiawan Li et.al. |
2505.03638 |
null |
| 2025-05-06 |
Faster MoE LLM Inference for Extremely Large Models |
Haoqi Yang et.al. |
2505.03531 |
null |
| 2025-05-06 |
STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation |
Maolin Wang et.al. |
2505.03484 |
null |
| 2025-05-06 |
3D Gaussian Splatting Data Compression with Mixture of Priors |
Lei Liu et.al. |
2505.03310 |
null |
| 2025-05-05 |
Finger Pose Estimation for Under-screen Fingerprint Sensor |
Xiongjun Guan et.al. |
2505.02481 |
link |
| 2025-05-05 |
Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems |
Kai Zhang et.al. |
2505.02381 |
null |
| 2025-05-05 |
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques |
Sanjay Surendranath Girija et.al. |
2505.02309 |
null |
| 2025-05-04 |
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields |
Zhenxing Mi et.al. |
2505.02005 |
link |
| 2025-05-03 |
Backdoor Attacks Against Patch-based Mixture of Experts |
Cedric Chan et.al. |
2505.01811 |
link |
| 2025-05-01 |
MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling |
Abdoul Majid O. Thiombiano et.al. |
2505.01459 |
null |
| 2025-05-02 |
Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders |
Rogelio A Mancisidor et.al. |
2505.01134 |
null |
| 2025-05-02 |
CoCoAFusE: Beyond Mixtures of Experts via Model Fusion |
Aurelio Raffa Ugolini et.al. |
2505.01105 |
null |
| 2025-05-01 |
Improving Routing in Sparse Mixture of Experts with Graph of Tokens |
Tam Nguyen et.al. |
2505.00792 |
null |
| 2025-05-01 |
CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series |
Tian Lan et.al. |
2505.00415 |
null |
| 2025-05-01 |
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing |
Piotr Piękos et.al. |
2505.00315 |
link |
| 2025-04-30 |
Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders |
Xuwei Yang et.al. |
2505.00216 |
null |
| 2025-04-29 |
TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts |
Pradip Kunwar et.al. |
2504.21190 |
null |
| 2025-04-29 |
Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization |
Shuai Gong et.al. |
2504.21063 |
null |
| 2025-04-26 |
PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight |
Ben Goertzel et.al. |
2504.21029 |
null |
| 2025-04-29 |
MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification |
Yichu Xu et.al. |
2504.20509 |
null |
| 2025-04-29 |
FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks |
Wenjing Xiao et.al. |
2504.20446 |
null |
| 2025-04-29 |
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation |
Amaan Izhar et.al. |
2504.20343 |
link |
| 2025-04-28 |
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication |
Athinagoras Skiadopoulos et.al. |
2504.19925 |
null |
| 2025-04-28 |
Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey |
Yunting Xu et.al. |
2504.19660 |
null |
| 2025-04-28 |
ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving |
Renju Feng et.al. |
2504.19580 |
link |
| 2025-04-29 |
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts |
Qingyue Wang et.al. |
2504.18598 |
null |
| 2025-04-25 |
NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation |
Rob Romijnders et.al. |
2504.18147 |
null |
| 2025-04-28 |
Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection |
Haokai Zhang et.al. |
2504.17834 |
link |
| 2025-04-22 |
Compass-V2 Technical Report |
Sophia Maria et.al. |
2504.15527 |
null |
| 2025-04-21 |
Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images |
Jonathan Brokman et.al. |
2504.15470 |
link |
| 2025-04-17 |
D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving |
Haodong Wang et.al. |
2504.15299 |
null |
| 2025-04-23 |
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core |
Dennis Liu et.al. |
2504.14960 |
null |
| 2025-04-18 |
Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts |
Jie Zou et.al. |
2504.13655 |
null |
| 2025-04-18 |
HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering |
Alexander Rusnak et.al. |
2504.13590 |
null |
| 2025-04-18 |
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts |
Ashwinee Panda et.al. |
2504.12463 |
link |
| 2025-04-16 |
Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models |
Yuanbo Tang et.al. |
2504.12359 |
null |
| 2025-04-16 |
Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data |
Sangwon Hyun et.al. |
2504.12287 |
null |
| 2025-04-16 |
MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models |
Hang Yuan et.al. |
2504.12234 |
null |
| 2025-04-15 |
Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology |
Henrik Häggström et.al. |
2504.11279 |
link |
| 2025-04-14 |
Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning |
LeiLei Ma et.al. |
2504.09990 |
null |
| 2025-04-14 |
Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications |
Nathalie Bartoli et.al. |
2504.09930 |
null |
| 2025-04-14 |
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming |
Zhiqiang He et.al. |
2504.09906 |
null |
| 2025-04-13 |
Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation |
Jia Wei et.al. |
2504.09601 |
null |
| 2025-04-12 |
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints |
Yichao Yuan et.al. |
2504.09345 |
null |
| 2025-04-12 |
Mixture of Group Experts for Learning Invariant Representations |
Lei Kang et.al. |
2504.09265 |
null |
| 2025-04-11 |
RouterKT: Mixture-of-Experts for Knowledge Tracing |
Han Liao et.al. |
2504.08989 |
link |
| 2025-04-11 |
Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design |
Robin Grapin et.al. |
2504.08671 |
null |
| 2025-04-10 |
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing |
Zhongyang Li et.al. |
2504.07964 |
link |
| 2025-04-11 |
Scaling Laws for Native Multimodal Models |
Mustafa Shukor et.al. |
2504.07951 |
null |
| 2025-04-10 |
Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models |
Hongcheng Guo et.al. |
2504.07807 |
link |
| 2025-04-10 |
Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network |
Peng Jia et.al. |
2504.07777 |
null |
| 2025-04-10 |
Kimi-VL Technical Report |
Kimi Team et.al. |
2504.07491 |
link |
| 2025-04-09 |
MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution |
Zhe Wang et.al. |
2504.07308 |
link |
| 2025-04-11 |
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models |
Ling Team et.al. |
2504.07158 |
null |
| 2025-04-09 |
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations |
Zican Dong et.al. |
2504.06792 |
null |
| 2025-04-09 |
FedMerge: Federated Personalization via Model Merging |
Shutong Chen et.al. |
2504.06768 |
null |
| 2025-04-08 |
S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning |
Hanqing Zeng et.al. |
2504.06426 |
null |
| 2025-04-08 |
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference |
Shuzhang Zhong et.al. |
2504.05897 |
link |
| 2025-04-08 |
Adaptive Substructure-Aware Expert Model for Molecular Property Prediction |
Tianyi Jiang et.al. |
2504.05844 |
null |
| 2025-04-10 |
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations |
Ajay Jaiswal et.al. |
2504.05586 |
null |
| 2025-04-07 |
SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement |
Zuying Xie et.al. |
2504.04818 |
null |
| 2025-04-06 |
On the Spatial Structure of Mixture-of-Experts in Transformers |
Daniel Bershatsky et.al. |
2504.04444 |
null |
| 2025-04-05 |
Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator |
Bing Wang et.al. |
2504.04076 |
link |
| 2025-04-04 |
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs |
Yongji Wu et.al. |
2504.03871 |
null |
| 2025-04-01 |
Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns |
Diego Vallarino et.al. |
2504.03750 |
null |
| 2025-04-04 |
RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation |
Hanbo Bi et.al. |
2504.03166 |
null |
| 2025-04-03 |
TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models |
Xinquan Wang et.al. |
2504.02712 |
null |
| 2025-04-07 |
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators |
Beichen Huang et.al. |
2504.02658 |
link |
| 2025-04-07 |
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism |
Ruidong Zhu et.al. |
2504.02263 |
null |
| 2025-04-02 |
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design |
Mohan Zhang et.al. |
2504.01337 |
null |
| 2025-04-01 |
Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function |
Qiuchen Song et.al. |
2504.00819 |
null |
| 2025-04-01 |
DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism |
Dengchun Li et.al. |
2504.00661 |
link |
| 2025-04-01 |
Continual Cross-Modal Generalization |
Yan Xia et.al. |
2504.00561 |
null |
| 2025-04-01 |
Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection |
Shunxin Chen et.al. |
2504.00458 |
null |
| 2025-03-31 |
Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion |
Jiagen Li et.al. |
2503.23721 |
null |
| 2025-03-30 |
Mixture of Routers |
Jia-Chen Zhang et.al. |
2503.23362 |
null |
| 2025-03-29 |
Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models |
Zehua Liu et.al. |
2503.23100 |
null |
| 2025-03-29 |
S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning |
Giang Do et.al. |
2503.23007 |
null |
| 2025-03-29 |
Sparse Mixture of Experts as Unified Competitive Learning |
Giang Do et.al. |
2503.22996 |
null |
| 2025-04-01 |
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities |
Raman Dutt et.al. |
2503.22517 |
null |
| 2025-03-27 |
RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction |
Armin Abdollahi et.al. |
2503.21971 |
null |
| 2025-03-27 |
iMedImage Technical Report |
Ran Wei et.al. |
2503.21836 |
null |
| 2025-03-27 |
LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models |
Hengyuan Zhao et.al. |
2503.21227 |
null |
| 2025-03-26 |
Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework |
Soham Sane et.al. |
2503.20750 |
null |
| 2025-03-26 |
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines |
Chen Tang et.al. |
2503.20748 |
null |
| 2025-03-26 |
Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning |
Sashuai Zhou et.al. |
2503.20633 |
null |
| 2025-03-26 |
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation |
Rongyu Zhang et.al. |
2503.20384 |
null |
| 2025-03-26 |
Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning |
Yousef Sadegheih et.al. |
2503.20326 |
link |
| 2025-03-25 |
Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion |
Konyul Park et.al. |
2503.19776 |
null |
| 2025-03-25 |
BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts |
Suzhe Xu et.al. |
2503.19769 |
null |
| 2025-03-25 |
M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation |
Ziyuan Liu et.al. |
2503.19406 |
null |
| 2025-03-27 |
Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design |
Rui Xie et.al. |
2503.18869 |
null |
| 2025-03-24 |
Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding |
Tianyu Chen et.al. |
2503.18578 |
null |
| 2025-03-24 |
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking |
Wenrui Cai et.al. |
2503.18338 |
link |
| 2025-03-23 |
Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding |
Ze Zhang et.al. |
2503.18104 |
link |
| 2025-03-22 |
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM |
Codefuse et.al. |
2503.17793 |
null |
| 2025-03-25 |
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts |
Yike Yuan et.al. |
2503.16057 |
null |
| 2025-03-21 |
UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations |
Debabrata Mandal et.al. |
2503.15868 |
null |
| 2025-05-27 |
Mixture of Lookup Experts |
Shibo Jie et.al. |
2503.15798 |
null |
| 2025-03-21 |
Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication |
Sin-Yu Huang et.al. |
2503.15722 |
null |
| 2025-03-19 |
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation |
Thomas Pickard et.al. |
2503.15358 |
null |
| 2025-03-21 |
Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition |
Seungyeon Cho et.al. |
2503.14960 |
null |
| 2025-03-18 |
Core-Periphery Principle Guided State Space Model for Functional Connectome Classification |
Minheng Chen et.al. |
2503.14655 |
null |
| 2025-03-18 |
MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts |
Runqi Meng et.al. |
2503.14355 |
null |
| 2025-03-18 |
SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture |
Tian Qin et.al. |
2503.13808 |
null |
| 2025-03-17 |
Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge |
Shengling Qin et.al. |
2503.13421 |
null |
| 2025-03-17 |
Channel Estimation for Pinching-Antenna Systems (PASS) |
Jian Xiao et.al. |
2503.13268 |
null |
| 2025-03-17 |
Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation |
Yu Liu et.al. |
2503.13254 |
null |
| 2025-03-16 |
Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps |
Mohammad Al-Jarrah et.al. |
2503.12633 |
link |
| 2025-03-16 |
MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts |
Harshit et.al. |
2503.12592 |
null |
| 2025-03-16 |
MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification |
Jianwei Zhao et.al. |
2503.12401 |
null |
| 2025-03-15 |
Adaptive Mixture of Experts Learning for Robust Audio Spoofing Detection |
Qixian Chen et.al. |
2503.12010 |
null |
| 2025-03-14 |
FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA |
Jieming Bian et.al. |
2503.11880 |
null |
| 2025-03-14 |
A Review of DeepSeek Models' Key Innovative Techniques |
Chengen Wang et.al. |
2503.11486 |
null |
| 2025-03-14 |
MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling |
Rachel S. Y. Teo et.al. |
2503.11144 |
link |
| 2025-03-13 |
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores |
Chenpeng Wu et.al. |
2503.10725 |
link |
| 2025-03-14 |
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis |
Luyuan Xie et.al. |
2503.10412 |
null |
| 2025-03-13 |
StableFusion: Continual Video Retrieval via Frame Adaptation |
Zecheng Zhao et.al. |
2503.10111 |
link |
| 2025-03-12 |
Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework |
Bakary Badjie et.al. |
2503.09504 |
null |
| 2025-03-12 |
Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment |
Nazanin Moradinasab et.al. |
2503.09498 |
link |
| 2025-03-12 |
Astrea: A MOE-based Visual Understanding Model with Progressive Alignment |
Xiaoda Yang et.al. |
2503.09445 |
null |
| 2025-03-12 |
Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach |
Ruifeng She et.al. |
2503.09357 |
null |
| 2025-03-12 |
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference |
Mohammad Siavashi et.al. |
2503.09304 |
null |
| 2025-03-13 |
FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models |
Fufangchen Zhao et.al. |
2503.09158 |
null |
| 2025-03-11 |
MoE-Loco: Mixture of Experts for Multitask Locomotion |
Runhan Huang et.al. |
2503.08564 |
null |
| 2025-03-11 |
Accelerating MoE Model Inference with Expert Sharding |
Oana Balmau et.al. |
2503.08467 |
null |
| 2025-03-11 |
Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models |
Junzhe Li et.al. |
2503.08120 |
null |
| 2025-03-11 |
MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models |
Han Zhao et.al. |
2503.08007 |
null |
| 2025-03-10 |
GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts |
Minwen Liao et.al. |
2503.07417 |
null |
| 2025-03-10 |
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications |
Siyuan Mu et.al. |
2503.07137 |
link |
| 2025-03-10 |
VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots |
Fu Chen et.al. |
2503.07049 |
link |
| 2025-03-10 |
ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration |
Mengting Ai et.al. |
2503.06881 |
link |
| 2025-03-10 |
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference |
Suraiya Tairin et.al. |
2503.06823 |
null |
| 2025-03-09 |
MoFE: Mixture of Frozen Experts Architecture |
Jean Seo et.al. |
2503.06491 |
null |
| 2025-03-09 |
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models |
Nguyen Do et.al. |
2503.06413 |
link |
| 2025-03-08 |
MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering |
Vinay Kumar Verma et.al. |
2503.06296 |
null |
| 2025-03-08 |
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts |
Wenzhuo Du et.al. |
2503.06064 |
null |
| 2025-03-08 |
MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model |
Miguel Contreras et.al. |
2503.06059 |
null |
| 2025-03-07 |
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning |
Justin Chih-Yao Chen et.al. |
2503.05641 |
null |
| 2025-03-07 |
FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework |
Jingyu Xu et.al. |
2503.05626 |
null |
| 2025-03-07 |
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts |
Weigao Sun et.al. |
2503.05447 |
link |
| 2025-03-07 |
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs |
Ling Team et.al. |
2503.05139 |
null |
| 2025-03-07 |
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts |
Shwai He et.al. |
2503.05066 |
null |
| 2025-03-06 |
Continual Pre-training of MoEs: How robust is your router? |
Benjamin Thérien et.al. |
2503.05029 |
null |
| 2025-03-06 |
Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining |
Houyi Li et.al. |
2503.04715 |
null |
| 2025-03-07 |
Question-Aware Gaussian Experts for Audio-Visual Question Answering |
Hongyeob Kim et.al. |
2503.04459 |
link |
| 2025-03-07 |
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling |
Yan Li et.al. |
2503.04398 |
null |
| 2025-03-06 |
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery |
Yiheng Zhu et.al. |
2503.04362 |
null |
| 2025-03-06 |
DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval |
Yating Liu et.al. |
2503.04144 |
null |
| 2025-03-05 |
VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection |
Enkhtogtokh Togootogtokh et.al. |
2503.03797 |
link |
| 2025-03-05 |
Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs |
Haoran Fan et.al. |
2503.03594 |
link |
| 2025-03-06 |
Convergence Rates for Softmax Gating Mixture of Experts |
Huy Nguyen et.al. |
2503.03213 |
null |
| 2025-03-04 |
MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation |
Weihang Wang et.al. |
2503.02799 |
link |
| 2025-03-04 |
FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting |
Congluo Xu et.al. |
2503.02692 |
null |
| 2025-03-04 |
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer |
Yujiao Yang et.al. |
2503.02495 |
link |
| 2025-03-04 |
Tabby: Tabular Data Synthesis with Language Models |
Sonia Cromp et.al. |
2503.02152 |
null |
| 2025-03-03 |
ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition |
Nastaran Mansourian et.al. |
2503.01750 |
null |
| 2025-03-03 |
Effective High-order Graph Representation Learning for Credit Card Fraud Detection |
Yao Zou et.al. |
2503.01556 |
null |
| 2025-03-03 |
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models |
Yongqi Huang et.al. |
2503.01359 |
null |
| 2025-03-03 |
PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation |
Linhai Zhang et.al. |
2503.01303 |
null |
| 2025-03-03 |
Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting |
Xiaobin Hong et.al. |
2503.01157 |
null |
| 2025-03-02 |
Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion |
Daiki Nishiyama et.al. |
2503.00925 |
null |
| 2025-03-01 |
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts |
Zhongyang Li et.al. |
2502.20395 |
link |
| 2025-02-27 |
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks |
Loukas Ilias et.al. |
2502.20213 |
null |
| 2025-02-27 |
Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems |
Zeyi Ren et.al. |
2502.20183 |
null |
| 2025-02-27 |
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook |
Yidi Jiang et.al. |
2502.20067 |
null |
| 2025-03-01 |
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts |
Shulai Zhang et.al. |
2502.19811 |
link |
| 2025-02-26 |
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization |
Taishi Nakamura et.al. |
2502.19261 |
null |
| 2025-02-26 |
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment |
Jiaxin Deng et.al. |
2502.18965 |
null |
| 2025-02-25 |
Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking |
Changyuan Zhao et.al. |
2502.18118 |
null |
| 2025-02-24 |
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE |
Andrei Chernov et.al. |
2502.17391 |
null |
| 2025-02-24 |
Delta Decompression for MoE-based LLMs Compression |
Hao Gu et.al. |
2502.17298 |
link |
| 2025-02-24 |
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks |
Andrei Chernov et.al. |
2502.17187 |
null |
| 2025-02-24 |
Muon is Scalable for LLM Training |
Jingyuan Liu et.al. |
2502.16982 |
link |
| 2025-02-24 |
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference |
Zewen Jin et.al. |
2502.16927 |
null |
| 2025-02-24 |
ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds |
Jiho Han et.al. |
2502.16914 |
null |
| 2025-02-26 |
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment |
Chenghao Fan et.al. |
2502.16894 |
link |
| 2025-02-22 |
An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning |
Masoud Shokrnezhad et.al. |
2502.16198 |
null |
| 2025-02-21 |
A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs |
Yuan Sun et.al. |
2502.15451 |
link |
| 2025-02-21 |
Tight Clusters Make Specialized Experts |
Stefan K. Nielsen et.al. |
2502.15315 |
link |
| 2025-02-21 |
Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction |
Baohang Zhou et.al. |
2502.15290 |
link |
| 2025-02-20 |
Ray-Tracing for Conditionally Activated Neural Networks |
Claudio Gallicchio et.al. |
2502.14788 |
null |
| 2025-02-21 |
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model |
Zhongyi Zhou et.al. |
2502.14420 |
link |
| 2025-02-19 |
Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts |
Xin Li et.al. |
2502.13577 |
null |
| 2025-02-18 |
MoBA: Mixture of Block Attention for Long-Context LLMs |
Enzhe Lu et.al. |
2502.13189 |
link |
| 2025-02-18 |
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models |
Gyeongman Kim et.al. |
2502.12947 |
null |
| 2025-02-18 |
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs |
Minxuan Lv et.al. |
2502.12455 |
null |
| 2025-02-17 |
From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs |
Kumari Nishu et.al. |
2502.12325 |
null |
| 2025-02-17 |
Accurate Expert Predictions in MoE Inference via Cross-Layer Gate |
Zhiyuan Fang et.al. |
2502.12224 |
null |
| 2025-02-17 |
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines |
Ayan Sengupta et.al. |
2502.12051 |
null |
| 2025-02-17 |
Connector-S: A Survey of Connectors in Multi-modal Large Language Models |
Xun Zhu et.al. |
2502.11453 |
null |
| 2025-02-16 |
Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time |
Robert Dahlke et.al. |
2502.11096 |
null |
| 2025-02-16 |
ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models |
Shixuan Li et.al. |
2502.11059 |
null |
| 2025-02-15 |
Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization |
Matthew Lyle Olson et.al. |
2502.10928 |
null |
| 2025-02-12 |
Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution |
Bowen Chen et.al. |
2502.09654 |
link |
| 2025-02-14 |
Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting |
Nicholas Dronen et.al. |
2502.09500 |
link |
| 2025-02-12 |
The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities |
Ning Li et.al. |
2502.08381 |
null |
| 2025-02-12 |
Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification |
Xuanze Chen et.al. |
2502.08083 |
null |
| 2025-02-13 |
Training Sparse Mixture Of Experts Text Embedding Models |
Zach Nussbaum et.al. |
2502.07972 |
link |
| 2025-02-11 |
Memory Analysis on the Training Course of DeepSeek Models |
Ping Zhang et.al. |
2502.07846 |
null |
| 2025-02-11 |
MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks |
Lotfi Abdelkrim Mecharbat et.al. |
2502.07422 |
null |
| 2025-02-11 |
Online Aggregation of Trajectory Predictors |
Alex Tong et.al. |
2502.07178 |
null |
| 2025-02-09 |
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline |
Zhiyuan Fang et.al. |
2502.06888 |
null |
| 2025-02-10 |
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing |
Seokjin Go et.al. |
2502.06643 |
null |
| 2025-02-10 |
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE |
Haiduo Huang et.al. |
2502.06282 |
link |
| 2025-02-10 |
Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models |
Peiran Wang et.al. |
2502.06094 |
null |
| 2025-02-08 |
Mol-MoE: Training Preference-Guided Routers for Molecule Generation |
Diego Calanzone et.al. |
2502.05633 |
link |
| 2025-02-08 |
UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA |
Jiale Dong et.al. |
2502.05602 |
link |
| 2025-02-07 |
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving |
Hanfei Yu et.al. |
2502.05370 |
null |
| 2025-02-07 |
Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts |
Roussel Desmond Nzoyem et.al. |
2502.05335 |
null |
| 2025-02-07 |
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient |
Jan Ludziejewski et.al. |
2502.05172 |
null |
| 2025-02-06 |
Mixture of neural operator experts for learning boundary conditions and model selection |
Dwyer Deighan et.al. |
2502.04562 |
null |
| 2025-02-06 |
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference |
Zehua Pei et.al. |
2502.04416 |
link |
| 2025-02-06 |
Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning |
Peizhuang Cong et.al. |
2502.03884 |
null |
| 2025-02-05 |
(GG) MoE vs. MLP on Tabular Data |
Andrei Chernov et.al. |
2502.03608 |
null |
| 2025-02-05 |
RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts |
Tuan Truong et.al. |
2502.03044 |
null |
| 2025-02-05 |
On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation |
Nghiem T. Diep et.al. |
2502.03029 |
null |
| 2025-02-05 |
Scaling Laws for Upcycling Mixture-of-Experts Language Models |
Seng Pei Liew et.al. |
2502.03009 |
null |
| 2025-02-04 |
ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals |
Jianan Nie et.al. |
2502.02748 |
null |
| 2025-02-04 |
Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism |
Yuhao Qing et.al. |
2502.02581 |
null |
| 2025-02-05 |
Brief analysis of DeepSeek R1 and its implications for Generative AI |
Sarah Mercer et.al. |
2502.02523 |
null |
| 2025-02-04 |
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference |
Nikhil Bhendawade et.al. |
2502.02040 |
null |
| 2025-02-05 |
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation |
Haibo Tong et.al. |
2502.01719 |
null |
| 2025-02-04 |
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs |
Yuhang Zhou et.al. |
2502.00997 |
null |
| 2025-02-03 |
CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling |
Xinze Wang et.al. |
2502.00965 |
null |
| 2025-02-02 |
UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs |
Yufei He et.al. |
2502.00806 |
link |
| 2025-02-02 |
Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective |
Yujin Oh et.al. |
2502.00619 |
link |
| 2025-02-01 |
PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning |
Yu Feng et.al. |
2502.00354 |
link |
| 2025-02-01 |
Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective |
Fanqi Yan et.al. |
2502.00281 |
null |
| 2025-01-31 |
Pheromone-based Learning of Optimal Reasoning Paths |
Anirudh Chari et.al. |
2501.19278 |
null |
| 2025-01-31 |
Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning |
Minh Le et.al. |
2501.18936 |
null |
| 2025-01-30 |
MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability |
Yan Sun et.al. |
2501.18439 |
null |
| 2025-01-29 |
Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework |
Jung-Hua Liu et.al. |
2501.17903 |
null |
| 2025-01-29 |
Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks |
Lucio La Cava et.al. |
2501.17557 |
null |
| 2025-01-28 |
3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow |
Yueen Ma et.al. |
2501.16698 |
null |
| 2025-01-27 |
MoEVD: Enhancing Vulnerability Detection by Mixture-of-Experts (MoE) |
Xu Yang et.al. |
2501.16454 |
null |
| 2025-01-27 |
Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference |
Yinghan Li et.al. |
2501.16103 |
null |
| 2025-01-25 |
ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning |
Shangqian Gao et.al. |
2501.15316 |
null |
| 2025-01-25 |
FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts |
Ziqi Liu et.al. |
2501.15125 |
link |
| 2025-01-25 |
Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning |
Ziyu Zhao et.al. |
2501.15103 |
null |
| 2025-01-24 |
Mean-field limit from general mixtures of experts to quantum neural networks |
Anderson Melchor Hernandez et.al. |
2501.14660 |
null |
| 2025-01-24 |
Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation |
Shengzhe Zhang et.al. |
2501.14269 |
link |
| 2025-01-24 |
Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images |
Zeyun Deng et.al. |
2501.14198 |
null |
| 2025-01-23 |
CSAOT: Cooperative Multi-Agent System for Active Object Tracking |
Hy Nguyen et.al. |
2501.13994 |
null |
| 2025-01-22 |
Autonomy-of-Experts Models |
Ang Lv et.al. |
2501.13074 |
null |
| 2025-01-22 |
LLM4WM: Adapting LLM for Wireless Multi-Tasking |
Xuanyu Liu et.al. |
2501.12983 |
null |
| 2025-01-22 |
UniUIR: Considering Underwater Image Restoration as An All-in-One Learner |
Xu Zhang et.al. |
2501.12981 |
null |
| 2025-01-22 |
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR |
Guodong Ma et.al. |
2501.12602 |
null |
| 2025-01-21 |
Modality Interactive Mixture-of-Experts for Fake News Detection |
Yifan Liu et.al. |
2501.12431 |
link |
| 2025-01-21 |
SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection |
Xiaocheng Zhang et.al. |
2501.12430 |
null |
| 2025-01-21 |
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models |
Samira Abnar et.al. |
2501.12370 |
null |
| 2025-01-21 |
MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks |
Qishen Zhou et.al. |
2501.12281 |
link |
| 2025-01-21 |
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models |
Zihan Qiu et.al. |
2501.11873 |
null |
| 2025-01-18 |
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models |
Xinglin Pan et.al. |
2501.10714 |
null |
| 2025-01-17 |
OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning |
Jinyuan Feng et.al. |
2501.10062 |
null |
| 2025-01-17 |
LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading |
Kuan-Ming Liu et.al. |
2501.09636 |
null |
| 2025-01-14 |
MiniMax-01: Scaling Foundation Models with Lightning Attention |
MiniMax et.al. |
2501.08313 |
null |
| 2025-01-14 |
GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism |
Chen Tang et.al. |
2501.07890 |
null |
| 2025-01-18 |
PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration |
Xiaoshui Huang et.al. |
2501.07762 |
null |
| 2025-01-13 |
A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis |
Binyu Zhang et.al. |
2501.07016 |
link |
| 2025-01-12 |
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning |
Hanwen Zhong et.al. |
2501.06884 |
link |
| 2025-01-10 |
TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning |
Yinghao Zhu et.al. |
2501.05661 |
link |
| 2025-01-09 |
Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing |
Mengfan Liu et.al. |
2501.05313 |
null |
| 2025-01-07 |
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes |
Xiang Xu et.al. |
2501.04004 |
link |
| 2025-01-07 |
mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training |
Xudong Liao et.al. |
2501.03905 |
null |
| 2025-01-08 |
Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection |
Donatella Genovese et.al. |
2501.03432 |
null |
| 2025-01-12 |
Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning |
Zhongyi Zhou et.al. |
2501.02198 |
null |
| 2025-01-03 |
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders |
Jiajun Cao et.al. |
2501.01709 |
null |
| 2025-01-01 |
REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization |
Huyen Nguyen et.al. |
2501.00779 |
null |
| 2025-01-06 |
Superposition in Transformers: A Novel Way of Building Mixture of Experts |
Ayoub Ben Chaliah et.al. |
2501.00530 |
link |
| 2024-12-31 |
CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection |
Xiaolei Wang et.al. |
2501.00346 |
null |
| 2024-12-29 |
Multimodal Variational Autoencoder: a Barycentric View |
Peijie Qiu et.al. |
2412.20487 |
null |
| 2024-12-29 |
A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement |
Sidra Nasir et.al. |
2412.20468 |
null |
| 2024-12-28 |
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity |
Jingbo Lin et.al. |
2412.20157 |
link |
| 2024-12-28 |
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection |
Yaning Zhang et.al. |
2412.20156 |
null |
| 2024-12-27 |
DeepSeek-V3 Technical Report |
DeepSeek-AI et.al. |
2412.19437 |
link |
| 2024-12-26 |
AskChart: Universal Chart Understanding through Textual Enhancement |
Xudong Yang et.al. |
2412.19146 |
link |
| 2024-12-30 |
Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection |
Xiaoyu Huang et.al. |
2412.19108 |
null |
| 2024-12-24 |
Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making |
David Shoresh et.al. |
2412.18593 |
link |
| 2024-12-24 |
BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing |
Yingjie Ma et.al. |
2412.18065 |
link |
| 2024-12-23 |
UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition |
Li Fu et.al. |
2412.17507 |
null |
| 2024-12-23 |
BrainMAP: Learning Multiple Activation Pathways in Brain Networks |
Song Wang et.al. |
2412.17404 |
link |
| 2024-12-22 |
Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models |
Elie Antoine et.al. |
2412.16971 |
null |
| 2024-12-20 |
Theory of Mixture-of-Experts for Mobile Edge Computing |
Hongbo Li et.al. |
2412.15690 |
null |
| 2024-12-19 |
MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale |
Swapnil Gandhi et.al. |
2412.15411 |
null |
| 2024-12-19 |
Qwen2.5 Technical Report |
Qwen et.al. |
2412.15115 |
link |
| 2024-12-19 |
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing |
Ziteng Wang et.al. |
2412.14711 |
link |
| 2024-12-18 |
A Survey on Inference Optimization Techniques for Mixture of Experts Models |
Jiacheng Liu et.al. |
2412.14219 |
link |
| 2024-12-18 |
SEKE: Specialised Experts for Keyword Extraction |
Matej Martinc et.al. |
2412.14087 |
link |
| 2024-12-18 |
MedCoT: Medical Chain of Thought via Hierarchical Expert |
Jiaxiang Liu et.al. |
2412.13736 |
link |
| 2024-12-17 |
SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks |
Mátyás Vincze et.al. |
2412.13053 |
link |
| 2024-12-17 |
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning |
Moritz Reuss et.al. |
2412.12953 |
null |
| 2024-12-17 |
CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition |
He Wang et.al. |
2412.12760 |
null |
| 2024-12-16 |
Investigating Mixture of Experts in Dense Retrieval |
Effrosyni Sokli et.al. |
2412.11864 |
null |
| 2024-12-18 |
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture |
Jingze Shi et.al. |
2412.11834 |
link |
| 2024-12-16 |
Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation |
Svetlana Pavlitska et.al. |
2412.11608 |
link |
| 2024-12-16 |
Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture |
Jingyu Xu et.al. |
2412.11557 |
null |
| 2024-12-14 |
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification |
Yuhao Wang et.al. |
2412.10650 |
link |
| 2024-12-13 |
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding |
Zhiyu Wu et.al. |
2412.10302 |
link |
| 2024-12-13 |
Llama 3 Meets MoE: Efficient Upcycling |
Aditya Vavre et.al. |
2412.09952 |
link |
| 2024-12-12 |
Memory Layers at Scale |
Vincent-Pierre Berges et.al. |
2412.09764 |
link |
| 2024-12-12 |
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine |
Xiaoshuang Huang et.al. |
2412.09278 |
link |
| 2024-12-12 |
Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective |
Minh Le et.al. |
2412.08285 |
null |
| 2024-12-11 |
Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification |
Xuanze Chen et.al. |
2412.08193 |
link |
| 2024-12-10 |
MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems |
Yao Fu et.al. |
2412.07067 |
null |
| 2024-12-07 |
Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts |
Arturo Rodriguez et.al. |
2412.06842 |
null |
| 2024-12-09 |
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset |
Xiao Wang et.al. |
2412.06647 |
link |
| 2024-12-09 |
UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts |
Zhen Wan et.al. |
2412.06340 |
null |
| 2024-12-08 |
Hallucination-aware Optimization for Large Language Model-empowered Communications |
Yinqiu Liu et.al. |
2412.06007 |
link |
| 2024-12-10 |
An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism |
Qing Zhang et.al. |
2412.05821 |
null |
| 2024-12-10 |
RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts |
Xu Liu et.al. |
2412.05679 |
link |
| 2024-12-07 |
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts |
Gengze Zhou et.al. |
2412.05552 |
link |
| 2024-12-07 |
Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers |
Boxun Xu et.al. |
2412.05540 |
null |
| 2024-12-06 |
Steps are all you need: Rethinking STEM Education with Prompt Engineering |
Krishnasai Addala et.al. |
2412.05023 |
null |
| 2024-12-09 |
Monet: Mixture of Monosemantic Experts for Transformers |
Jungwoo Park et.al. |
2412.04139 |
link |
| 2024-12-05 |
Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks |
Zhaoyang Liu et.al. |
2412.03850 |
null |
| 2024-12-04 |
Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond |
Loukas Ilias et.al. |
2412.03483 |
null |
| 2024-12-05 |
MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption |
Siddhant Dutta et.al. |
2412.01858 |
null |
| 2024-12-05 |
Yi-Lightning Technical Report |
01. AI et.al. |
2412.01253 |
null |
| 2024-11-30 |
Mixture of Experts for Node Classification |
Yu Shi et.al. |
2412.00418 |
null |
| 2024-11-30 |
HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting |
Shaohan Yu et.al. |
2412.00316 |
null |
| 2024-11-27 |
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference |
Andrii Skliar et.al. |
2412.00099 |
null |
| 2024-11-29 |
LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References |
Shuguo Jiang et.al. |
2411.19758 |
null |
| 2024-11-28 |
On the effectiveness of discrete representations in sparse mixture of experts |
Giang Do et.al. |
2411.19402 |
null |
| 2024-11-28 |
Bayesian Cluster Weighted Gaussian Models |
Panagiotis Papastamoulis et.al. |
2411.18957 |
link |
| 2024-11-27 |
UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS |
Haomin Zhuang et.al. |
2411.18797 |
null |
| 2024-11-27 |
Complexity Experts are Task-Discriminative Learners for Any Image Restoration |
Eduard Zamfir et.al. |
2411.18466 |
null |
| 2024-11-27 |
Mixture of Experts in Image Classification: What's the Sweet Spot? |
Mathurin Videau et.al. |
2411.18322 |
null |
| 2024-11-26 |
$H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs |
Selim Furkan Tekin et.al. |
2411.17792 |
link |
| 2024-11-25 |
Staleness-Centric Optimizations for Efficient Diffusion MoE Inference |
Jiajun Luo et.al. |
2411.16786 |
null |
| 2024-11-29 |
MH-MoE: Multi-Head Mixture-of-Experts |
Shaohan Huang et.al. |
2411.16205 |
null |
| 2024-11-25 |
LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy |
Peng Cui et.al. |
2411.16095 |
null |
| 2024-11-24 |
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution |
Haiquan Wang et.al. |
2411.15871 |
null |
| 2024-11-24 |
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training |
Xiaoye Qu et.al. |
2411.15708 |
link |
| 2024-11-23 |
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts |
Qizhou Chen et.al. |
2411.15432 |
null |
| 2024-11-23 |
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation |
Fahao Chen et.al. |
2411.15419 |
null |
| 2024-11-20 |
MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification |
Yuxuan Chen et.al. |
2411.13004 |
null |
| 2024-11-23 |
KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning |
Ming Yin et.al. |
2411.12950 |
null |
| 2024-11-19 |
Ultra-Sparse Memory Network |
Zihao Huang et.al. |
2411.12364 |
null |
| 2024-11-18 |
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs |
Shiyi Cao et.al. |
2411.11217 |
null |
| 2024-11-16 |
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts |
Jinqiang Long et.al. |
2411.10669 |
link |
| 2024-11-15 |
Weakly-Supervised Multimodal Learning on MIMIC-CXR |
Andrea Agostini et.al. |
2411.10356 |
link |
| 2024-11-21 |
Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models |
Wei Wang et.al. |
2411.10003 |
null |
| 2024-11-13 |
Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection |
Vima Gupta et.al. |
2411.08982 |
null |
| 2024-11-13 |
Sparse Upcycling: Inference Inefficient Finetuning |
Sasha Doubov et.al. |
2411.08968 |
null |
| 2024-11-13 |
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing |
Xiaonan Nie et.al. |
2411.08446 |
null |
| 2024-11-12 |
Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach |
Renzi Wang et.al. |
2411.08232 |
null |
| 2024-11-12 |
PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model |
Yilun Liu et.al. |
2411.08212 |
null |
| 2024-11-12 |
Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge |
Emmanuel Azuh Mensah et.al. |
2411.07834 |
null |
| 2024-11-11 |
Adaptive Conditional Expert Selection Network for Multi-domain Recommendation |
Kuiyao Dong et.al. |
2411.06826 |
null |
| 2024-11-11 |
WDMoE: Wireless Distributed Mixture of Experts for Large Language Models |
Nan Xue et.al. |
2411.06681 |
null |
| 2024-11-09 |
Learning Mixtures of Experts with EM |
Quentin Fruytier et.al. |
2411.06056 |
null |
| 2024-11-08 |
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts |
Yen-Ting Lin et.al. |
2411.05945 |
null |
| 2024-11-05 |
DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts |
Zelin Yao et.al. |
2411.03025 |
link |
| 2024-11-05 |
Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts |
Yuan Xie et.al. |
2411.02787 |
null |
| 2024-11-06 |
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent |
Xingwu Sun et.al. |
2411.02265 |
null |
| 2024-11-04 |
FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation |
Ziwei Zhan et.al. |
2411.02115 |
null |
| 2024-11-03 |
RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering |
Hui Lin et.al. |
2411.01595 |
null |
| 2024-11-03 |
Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation |
Mingrui Liu et.al. |
2411.01457 |
null |
| 2024-11-06 |
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference |
Peng Tang et.al. |
2411.01433 |
null |
| 2024-11-07 |
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy |
Shuqing Luo et.al. |
2411.01288 |
link |
| 2024-11-02 |
PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment |
Dongxu Liu et.al. |
2411.01245 |
null |
| 2024-11-01 |
MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition |
Cheng Yang et.al. |
2411.01016 |
null |
| 2024-11-01 |
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models |
Nam V. Nguyen et.al. |
2411.00918 |
link |
| 2024-11-01 |
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization |
Jingming Guo et.al. |
2411.00662 |
link |
| 2024-10-31 |
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts |
Xiang Deng et.al. |
2410.23836 |
null |
| 2024-10-30 |
Efficient and Interpretable Grammatical Error Correction with Mixture of Experts |
Muhammad Reza Qorib et.al. |
2410.23507 |
link |
| 2024-10-30 |
Stealing User Prompts from Mixture of Experts |
Itay Yona et.al. |
2410.22884 |
null |
| 2024-10-30 |
MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning |
Xujia Wang et.al. |
2410.22782 |
null |
| 2024-10-29 |
ProMoE: Fast MoE-based LLM Serving using Proactive Caching |
Xiaoniu Song et.al. |
2410.22134 |
null |
| 2024-10-29 |
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging |
Li Shen et.al. |
2410.21804 |
null |
| 2024-10-29 |
Neural Experts: Mixture of Experts for Implicit Neural Representations |
Yizhak Ben-Shabat et.al. |
2410.21643 |
null |
| 2024-10-28 |
FinTeamExperts: Role Specialized MOEs For Financial Analysis |
Yue Yu et.al. |
2410.21338 |
null |
| 2024-10-28 |
Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving |
Jiyao Wang et.al. |
2410.21086 |
null |
| 2024-10-27 |
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation |
Maohao Shen et.al. |
2410.20336 |
null |
| 2024-10-27 |
GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields |
Yusuke Sekikawa et.al. |
2410.20306 |
null |
| 2024-10-25 |
DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction |
Zelin Zang et.al. |
2410.19504 |
link |
| 2024-10-25 |
Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis |
Weikai Li et.al. |
2410.19225 |
link |
| 2024-10-24 |
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design |
Ruisi Cai et.al. |
2410.19123 |
link |
| 2024-10-24 |
Mixture of Parrots: Experts improve memorization more than reasoning |
Samy Jelassi et.al. |
2410.19034 |
null |
| 2024-10-24 |
MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases |
Zhisheng Lin et.al. |
2410.18406 |
null |
| 2024-10-23 |
Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches |
Kexin Feng et.al. |
2410.18298 |
null |
| 2024-10-23 |
MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning |
Jingfan Zhang et.al. |
2410.18035 |
null |
| 2024-10-24 |
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference |
Xin He et.al. |
2410.17954 |
null |
| 2024-10-23 |
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition |
Artem Basharin et.al. |
2410.17765 |
null |
| 2024-10-22 |
Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling |
Jialong Li et.al. |
2410.17043 |
null |
| 2024-10-21 |
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset |
Ruikun Zhang et.al. |
2410.16095 |
link |
| 2024-10-22 |
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts |
Zhenpeng Su et.al. |
2410.16077 |
link |
| 2024-10-21 |
Generalizing Motion Planners with Mixture of Experts for Autonomous Driving |
Qiao Sun et.al. |
2410.15774 |
link |
| 2024-10-21 |
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts |
Xumeng Han et.al. |
2410.15732 |
null |
| 2024-10-20 |
Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs |
Xin Zhou et.al. |
2410.15438 |
null |
| 2024-10-20 |
LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration |
Yuang Ai et.al. |
2410.15385 |
link |
| 2024-10-19 |
MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning |
Suning Huang et.al. |
2410.14972 |
null |
| 2024-10-18 |
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts |
Rachel S. Y. Teo et.al. |
2410.14574 |
link |
| 2024-10-18 |
ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction |
Haoyu He et.al. |
2410.14099 |
link |
| 2024-10-17 |
Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks |
Jinze Zhao et.al. |
2410.13964 |
null |
| 2024-10-16 |
On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs |
Herun Wan et.al. |
2410.12600 |
null |
| 2024-10-16 |
Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts |
Fanqi Yan et.al. |
2410.12258 |
null |
| 2024-10-16 |
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference |
Yulei Qian et.al. |
2410.12247 |
null |
| 2024-10-15 |
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router |
Yanyue Xie et.al. |
2410.12013 |
null |
| 2024-10-15 |
MoH: Multi-Head Attention as Mixture-of-Head Attention |
Peng Jin et.al. |
2410.11842 |
link |
| 2024-10-15 |
GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation |
Fei Tang et.al. |
2410.11841 |
link |
| 2024-10-15 |
Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models |
James Vo et.al. |
2410.11654 |
null |
| 2024-10-16 |
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight |
Pedram Akbarian et.al. |
2410.11222 |
null |
| 2024-10-16 |
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free |
Ziyue Li et.al. |
2410.10814 |
link |
| 2024-10-14 |
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts |
Guorui Zheng et.al. |
2410.10626 |
link |
| 2024-10-14 |
Learning to Ground VLMs without Forgetting |
Aritra Bhowmik et.al. |
2410.10491 |
null |
| 2024-10-14 |
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts |
Xu Liu et.al. |
2410.10469 |
null |
| 2024-10-15 |
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs |
Tongtian Yue et.al. |
2410.10456 |
null |
| 2024-10-14 |
Tighter Risk Bounds for Mixtures of Experts |
Wissam Akretche et.al. |
2410.10397 |
null |
| 2024-10-14 |
Scalable Multi-Domain Adaptation of Language Models using Modular Experts |
Peter Schafhalter et.al. |
2410.10181 |
null |
| 2024-10-14 |
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models |
Jun Luo et.al. |
2410.10114 |
link |
| 2024-10-14 |
AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality |
Peijun Qing et.al. |
2410.10054 |
link |
| 2024-10-13 |
ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL |
Zhanqiu Guo et.al. |
2410.09781 |
null |
| 2024-10-11 |
Semi-Supervised Learning of Noisy Mixture of Experts Models |
Oh-Ran Kwon et.al. |
2410.09039 |
null |
| 2024-10-11 |
Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering |
I-Chun Chen et.al. |
2410.08589 |
link |
| 2024-10-10 |
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts |
Sukwon Yun et.al. |
2410.08245 |
link |
| 2024-10-10 |
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training |
Gen Luo et.al. |
2410.08202 |
null |
| 2024-10-10 |
Efficient Dictionary Learning with Switch Sparse Autoencoders |
Anish Mudide et.al. |
2410.08201 |
link |
| 2024-10-10 |
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing |
Sagi Shaier et.al. |
2410.08003 |
link |
| 2024-10-10 |
SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture |
Jiayi Han et.al. |
2410.07739 |
null |
| 2024-10-10 |
Upcycling Large Language Models into Mixture of Experts |
Ethan He et.al. |
2410.07524 |
null |
| 2024-10-09 |
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts |
Peng Jin et.al. |
2410.07348 |
link |
| 2024-10-09 |
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders |
David Noever et.al. |
2410.06462 |
null |
| 2024-10-09 |
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs |
Ruijia Niu et.al. |
2410.06431 |
null |
| 2024-10-08 |
Probing the Robustness of Theory of Mind in Large Language Models |
Christian Nickel et.al. |
2410.06271 |
null |
| 2024-10-08 |
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More |
Wei Huang et.al. |
2410.06270 |
link |
| 2024-10-08 |
Aria: An Open Multimodal Native Mixture-of-Experts Model |
Dongxu Li et.al. |
2410.05993 |
link |
| 2024-10-08 |
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models |
Siqi Wang et.al. |
2410.05661 |
null |
| 2024-10-07 |
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild |
Xinyu Zhao et.al. |
2410.05357 |
link |
| 2024-10-07 |
Multimodal Fusion Strategies for Mapping Biophysical Landscape Features |
Lucia Gordon et.al. |
2410.04833 |
link |
| 2024-10-06 |
Realizing Video Summarization from the Path of Language-based Semantic Understanding |
Kuan-Chen Mu et.al. |
2410.04511 |
null |
| 2024-10-09 |
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding |
Wei Wu et.al. |
2410.03553 |
null |
| 2024-10-04 |
Exploring the Benefit of Activation Sparsity in Pre-training |
Zhengyan Zhang et.al. |
2410.03440 |
link |
| 2024-10-03 |
MLP-KAN: Unifying Deep Representation and Function Learning |
Yunhong He et.al. |
2410.03027 |
link |
| 2024-10-03 |
On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions |
Huy Nguyen et.al. |
2410.02935 |
null |
| 2024-10-03 |
Neutral residues: revisiting adapters for model extension |
Franck Signe Talla et.al. |
2410.02744 |
null |
| 2024-10-03 |
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping |
Ziye Huang et.al. |
2410.02475 |
null |
| 2024-10-03 |
MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction |
Zhaojian Yu et.al. |
2410.02241 |
null |
| 2024-10-03 |
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts |
Minh Le et.al. |
2410.02200 |
link |
| 2024-10-04 |
Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices |
Andres Potapczynski et.al. |
2410.02117 |
link |
| 2024-10-04 |
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing |
Haotian Sun et.al. |
2410.02098 |
null |
| 2024-10-02 |
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL |
Ghada Sokar et.al. |
2410.01930 |
null |
| 2024-10-02 |
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models |
Shayekh Bin Islam et.al. |
2410.01782 |
link |
| 2024-10-02 |
Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging |
Tingfeng Hui et.al. |
2410.01610 |
null |
| 2024-10-02 |
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs |
Hong Li et.al. |
2410.01417 |
null |
| 2024-10-01 |
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards |
Sheng Wang et.al. |
2410.00938 |
null |
| 2024-10-01 |
UniAdapt: A Universal Adapter for Knowledge Calibration |
Tai D. Nguyen et.al. |
2410.00454 |
null |
| 2024-10-01 |
Robust Traffic Forecasting against Spatial Shift over Years |
Hongjun Wang et.al. |
2410.00373 |
link |
| 2024-09-29 |
IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method |
Chaohui Xu et.al. |
2410.00059 |
null |
| 2024-09-30 |
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning |
Haotian Zhang et.al. |
2409.20566 |
null |
| 2024-10-02 |
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling |
Jihai Zhang et.al. |
2409.19291 |
link |
| 2024-09-27 |
SciDFM: A Large Language Model with Mixture-of-Experts for Science |
Liangtai Sun et.al. |
2409.18412 |
null |
| 2024-09-26 |
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE |
Xun Zhu et.al. |
2409.17508 |
link |
| 2024-09-26 |
A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction |
Guangyu Wang et.al. |
2409.17440 |
link |
| 2024-09-24 |
Leveraging Mixture of Experts for Improved Speech Deepfake Detection |
Viola Negroni et.al. |
2409.16077 |
null |
| 2024-10-02 |
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts |
Xiaoming Shi et.al. |
2409.16040 |
link |
| 2024-09-24 |
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM |
Fengrun Zhang et.al. |
2409.15905 |
null |
| 2024-09-24 |
Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks |
Jiayi He et.al. |
2409.15695 |
null |
| 2024-09-23 |
A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts |
Hugo Inzirillo et.al. |
2409.15161 |
link |
| 2024-09-23 |
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond |
Hong Chen et.al. |
2409.14993 |
null |
| 2024-09-21 |
Routing in Sparsely-gated Language Models responds to Context |
Stefan Arnold et.al. |
2409.14107 |
null |
| 2024-09-20 |
On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists |
Dongyang Fan et.al. |
2409.13931 |
link |
| 2024-09-20 |
Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning |
Annette Spooner et.al. |
2409.13791 |
null |
| 2024-09-19 |
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts |
Yihan Wu et.al. |
2409.12370 |
null |
| 2024-09-18 |
GRIN: GRadient-INformed MoE |
Liyuan Liu et.al. |
2409.12136 |
null |
| 2024-09-18 |
Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 |
Zhiyong Wang et.al. |
2409.11909 |
link |
| 2024-09-17 |
LPT++: Efficient Training on Mixture of Long-tailed Experts |
Bowen Dong et.al. |
2409.11323 |
null |
| 2024-09-19 |
LOLA -- An Open-Source Massively Multilingual Large Language Model |
Nikit Srivastava et.al. |
2409.11272 |
link |
| 2024-09-16 |
Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression |
Yi-Hsin Li et.al. |
2409.10101 |
null |
| 2024-09-14 |
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving |
Enming Zhang et.al. |
2409.07267 |
link |
| 2024-09-10 |
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models |
Maryam Akhavan Aghdam et.al. |
2409.06669 |
null |
| 2024-09-10 |
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning |
Jaeseong Lee et.al. |
2409.06211 |
null |
| 2024-09-10 |
VE: Modeling Multivariate Time Series Correlation with Variate Embedding |
Shangjiong Wang et.al. |
2409.06169 |
link |
| 2024-09-09 |
Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models |
Hongyang Lei et.al. |
2409.05929 |
link |
| 2024-09-09 |
Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks |
Bo Xu et.al. |
2409.05726 |
null |
| 2024-09-09 |
Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection |
Tianwu Lei et.al. |
2409.05611 |
null |
| 2024-09-05 |
Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions |
Zemian Ke et.al. |
2409.03282 |
null |
| 2024-09-05 |
ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding |
Zhengzhuo Xu et.al. |
2409.03277 |
null |
| 2024-09-05 |
xLAM: A Family of Large Action Models to Empower AI Agent Systems |
Jianguo Zhang et.al. |
2409.03215 |
link |
| 2024-09-04 |
Configurable Foundation Models: Building LLMs from a Modular Perspective |
Chaojun Xiao et.al. |
2409.02877 |
null |
| 2024-09-04 |
Pluralistic Salient Object Detection |
Xuelu Feng et.al. |
2409.02368 |
null |
| 2024-09-03 |
OLMoE: Open Mixture-of-Experts Language Models |
Niklas Muennighoff et.al. |
2409.02060 |
link |
| 2024-09-05 |
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model |
Hukai Huang et.al. |
2409.02050 |
null |
| 2024-09-02 |
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning |
Soumajyoti Sarkar et.al. |
2409.01483 |
null |
| 2024-09-02 |
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching |
Sungmin Yun et.al. |
2409.01141 |
null |
| 2024-09-04 |
Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack |
Guanzhong Chen et.al. |
2409.00960 |
link |
| 2024-09-02 |
Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts |
Youngseog Chung et.al. |
2409.00879 |
null |
| 2024-08-29 |
Gradient-free variational learning with conditional mixture networks |
Conor Heins et.al. |
2408.16429 |
link |
| 2024-08-28 |
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models |
Yuncheng Yang et.al. |
2408.15915 |
link |
| 2024-08-28 |
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts |
Nikolas Gritsch et.al. |
2408.15901 |
null |
| 2024-08-28 |
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation |
Fangxun Shu et.al. |
2408.15881 |
link |
| 2024-08-28 |
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts |
Lean Wang et.al. |
2408.15664 |
null |
| 2024-08-27 |
Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis |
Sakhinana Sagar Srinivas et.al. |
2408.15305 |
null |
| 2024-08-27 |
MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce |
Hao Jiang et.al. |
2408.14968 |
null |
| 2024-08-24 |
Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings |
Sagar Srinivas Sakhinana et.al. |
2408.13622 |
null |
| 2024-08-23 |
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities |
Venkatesh Balavadhani Parthasarathy et.al. |
2408.13296 |
null |
| 2024-08-23 |
Guiding IoT-Based Healthcare Alert Systems with Large Language Models |
Yulan Gao et.al. |
2408.13071 |
null |
| 2024-08-23 |
DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation |
Xiaowei Mao et.al. |
2408.12809 |
link |
| 2024-08-23 |
Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth |
Yuxiang Wei et.al. |
2408.12803 |
null |
| 2024-08-23 |
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection |
Hang Zou et.al. |
2408.12793 |
null |
| 2024-08-22 |
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging |
Mohammadreza Pourreza et.al. |
2408.12733 |
null |
| 2024-08-22 |
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale |
Jamba Team et.al. |
2408.12570 |
null |
| 2024-08-22 |
Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators |
Dingkang Yang et.al. |
2408.12325 |
link |
| 2024-08-21 |
MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing |
Hao Zhou et.al. |
2408.11396 |
link |
| 2024-08-21 |
KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? |
Xiao Han et.al. |
2408.11306 |
link |
| 2024-08-21 |
FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts |
Hanzi Mei et.al. |
2408.11304 |
null |
| 2024-08-20 |
Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data |
Atmika Gorti et.al. |
2408.11247 |
null |
| 2024-08-20 |
Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting |
Jianxiang Zhou et.al. |
2408.10822 |
link |
| 2024-08-20 |
AnyGraph: Graph Foundation Model in the Wild |
Lianghao Xia et.al. |
2408.10700 |
link |
| 2024-08-20 |
HMoE: Heterogeneous Mixture of Experts for Language Modeling |
An Wang et.al. |
2408.10681 |
null |
| 2024-08-19 |
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference |
Shuzhang Zhong et.al. |
2408.10284 |
link |
| 2024-08-17 |
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models |
Xiaochen Wang et.al. |
2408.10276 |
link |
| 2024-08-19 |
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation |
Xiaoyu Kong et.al. |
2408.10159 |
link |
| 2024-08-19 |
A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method |
Hang Zou et.al. |
2408.09752 |
null |
| 2024-08-16 |
Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection |
Haohao Zhu et.al. |
2408.08551 |
link |
| 2024-08-17 |
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts |
Qizhen Zhang et.al. |
2408.08274 |
null |
| 2024-08-14 |
Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation |
CanYi Liu et.al. |
2408.07427 |
null |
| 2024-08-13 |
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning |
Prateek Yadav et.al. |
2408.07057 |
null |
| 2024-08-13 |
Layerwise Recurrent Router for Mixture-of-Experts |
Zihan Qiu et.al. |
2408.06793 |
link |
| 2024-08-13 |
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies |
Bo-Wen Zhang et.al. |
2408.06567 |
null |
| 2024-08-10 |
HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou |
Xu Wang et.al. |
2408.05430 |
null |
| 2024-08-08 |
Understanding the Performance and Estimating the Cost of LLM Fine-Tuning |
Yuchen Xia et.al. |
2408.04693 |
link |
| 2024-08-08 |
Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training |
Weilin Cai et.al. |
2408.04307 |
null |
| 2024-08-08 |
LaDiMo: Layer-wise Distillation Inspired MoEfier |
Sungyoon Kim et.al. |
2408.04278 |
null |
| 2024-08-07 |
MoExtend: Tuning New Experts for Modality and Task Extension |
Shanshan Zhong et.al. |
2408.03511 |
link |
| 2024-08-05 |
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization |
Changtao Miao et.al. |
2408.02306 |
null |
| 2024-08-02 |
HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction |
Xingyu Lou et.al. |
2408.01332 |
null |
| 2024-08-01 |
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation |
Hai Yu et.al. |
2408.00365 |
null |
| 2024-08-12 |
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts |
Xi Victoria Lin et.al. |
2407.21770 |
null |
| 2024-07-31 |
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning |
Min Jae Jung et.al. |
2407.21571 |
null |
| 2024-07-30 |
Distribution Learning for Molecular Regression |
Nima Shoghi et.al. |
2407.20475 |
null |
| 2024-07-29 |
Time series forecasting with high stakes: A field study of the air cargo industry |
Abhinav Garg et.al. |
2407.20192 |
null |
| 2024-07-30 |
Mixture of Nested Experts: Adaptive Processing of Visual Tokens |
Gagan Jain et.al. |
2407.19985 |
null |
| 2024-07-28 |
Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models |
Mohammed Al-Maamari et.al. |
2407.19610 |
link |
| 2024-07-26 |
Wolf: Captioning Everything with a World Summarization Framework |
Boyi Li et.al. |
2407.18908 |
null |
| 2024-07-26 |
MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition |
Chang Liu et.al. |
2407.18616 |
link |
| 2024-07-26 |
Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition |
Hukai Huang et.al. |
2407.18581 |
link |
| 2024-07-25 |
How Lightweight Can A Vision Transformer Be |
Jen Hong Tan et.al. |
2407.17783 |
null |
| 2024-07-24 |
Exploring Domain Robust Lightweight Reward Models based on Router Mechanism |
Hyuk Namgoong et.al. |
2407.17546 |
null |
| 2024-07-24 |
M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis |
Junyu Li et.al. |
2407.17267 |
link |
| 2024-07-25 |
Cheems: Wonderful Matrices More Efficient and More Effective Architecture |
Jingze Shi et.al. |
2407.16958 |
null |
| 2024-07-22 |
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget |
Vikash Sehwag et.al. |
2407.15811 |
link |
| 2024-07-22 |
Norface: Improving Facial Expression Analysis by Identity Normalization |
Hanwei Liu et.al. |
2407.15617 |
link |
| 2024-07-19 |
Mixture of Experts with Mixture of Precisions for Tuning Quality of Service |
HamidReza Imani et.al. |
2407.14417 |
null |
| 2024-07-19 |
EVLM: An Efficient Vision-Language Model for Visual Understanding |
Kaibing Chen et.al. |
2407.14177 |
null |
| 2024-07-19 |
Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models |
Qiong Wu et.al. |
2407.14093 |
null |
| 2024-07-18 |
Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts |
Francesco Folino et.al. |
2407.13526 |
null |
| 2024-07-18 |
Mixture of Experts based Multi-task Supervise Learning from Crowds |
Tao Han et.al. |
2407.13268 |
null |
| 2024-07-15 |
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration |
Yulin Ren et.al. |
2407.10833 |
null |
| 2024-07-18 |
Qwen2 Technical Report |
An Yang et.al. |
2407.10671 |
link |
| 2024-07-15 |
Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering |
Francesco Di Sario et.al. |
2407.10389 |
null |
| 2024-07-13 |
Low-Rank Interconnected Adaptation Across Layers |
Yibo Zhong et.al. |
2407.09946 |
link |
| 2024-07-13 |
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts |
Zhenpeng Su et.al. |
2407.09816 |
link |
| 2024-07-12 |
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts |
Zeliang Zhang et.al. |
2407.09590 |
null |
| 2024-07-11 |
An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio |
Siding Zeng et.al. |
2407.08239 |
null |
| 2024-07-10 |
MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations |
Vignesh Prasad et.al. |
2407.07636 |
link |
| 2024-07-10 |
Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation |
Szymon Płotka et.al. |
2407.07514 |
link |
| 2024-07-09 |
A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts |
Atilla Özgür et.al. |
2407.06718 |
null |
| 2024-07-06 |
SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation |
Guoan Wang et.al. |
2407.04938 |
null |
| 2024-07-06 |
Completed Feature Disentanglement Learning for Multimodal MRIs Analysis |
Tianling Liu et.al. |
2407.04916 |
link |
| 2024-07-05 |
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation |
Sungkyun Chang et.al. |
2407.04822 |
link |
| 2024-07-05 |
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement |
Yongji Wu et.al. |
2407.04656 |
null |
| 2024-07-05 |
MobileFlow: A Multimodal LLM For Mobile GUI Agent |
Songqin Nong et.al. |
2407.04346 |
null |
| 2024-07-04 |
Mixture of A Million Experts |
Xu Owen He et.al. |
2407.04153 |
null |
| 2024-07-02 |
Terminating Differentiable Tree Experts |
Jonathan Thomm et.al. |
2407.02060 |
null |
| 2024-07-05 |
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models |
Zihan Wang et.al. |
2407.01906 |
link |
| 2024-07-01 |
Uncertainty Quantification in Table Structure Recognition |
Kehinde Ajayi et.al. |
2407.01731 |
link |
| 2024-07-01 |
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning |
Yixiao Wang et.al. |
2407.01531 |
null |
| 2024-07-01 |
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation |
Nadezhda Chirkova et.al. |
2407.01126 |
null |
| 2024-07-01 |
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs |
Enshu Liu et.al. |
2407.00945 |
link |
| 2024-07-03 |
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules |
Xinglin Pan et.al. |
2407.00599 |
link |
| 2024-07-02 |
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts |
Ruochen Wang et.al. |
2407.00256 |
null |
| 2024-06-28 |
LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models |
Renzhi Wang et.al. |
2406.20030 |
null |
| 2024-06-28 |
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model |
Longrong Yang et.al. |
2406.19905 |
link |
| 2024-06-28 |
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR |
Qiuming Zhao et.al. |
2406.19706 |
link |
| 2024-06-27 |
A Teacher Is Worth A Million Instructions |
Nikhil Kothari et.al. |
2406.19112 |
link |
| 2024-06-27 |
Towards Personalized Federated Multi-scenario Multi-task Recommendation |
Yue Ding et.al. |
2406.18938 |
null |
| 2024-06-26 |
Mixture of Experts in a Mixture of RL settings |
Timon Willi et.al. |
2406.18420 |
null |
| 2024-06-26 |
A Closer Look into Mixture-of-Experts in Large Language Models |
Ka Man Lo et.al. |
2406.18219 |
link |
| 2024-06-26 |
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR |
Shuaishuai Ye et.al. |
2406.18021 |
null |
| 2024-06-24 |
Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction |
Bruce Rushing et.al. |
2406.17150 |
link |
| 2024-06-24 |
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training |
Tong Zhu et.al. |
2406.16554 |
link |
| 2024-06-25 |
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser |
Jingze Shi et.al. |
2406.16495 |
link |
| 2024-06-24 |
Theory on Mixture-of-Experts in Continual Learning |
Hongbo Li et.al. |
2406.16437 |
null |
| 2024-06-22 |
SimSMoE: Solving Representational Collapse via Similarity Measure |
Giang Do et.al. |
2406.15883 |
null |
| 2024-06-20 |
Voice Disorder Analysis: a Transformer-based Approach |
Alkis Koudounas et.al. |
2406.14693 |
link |
| 2024-06-19 |
Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation |
Qian Chen et.al. |
2406.13583 |
null |
| 2024-06-19 |
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models |
Zihao Zeng et.al. |
2406.13233 |
link |
| 2024-06-18 |
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts |
Haoxiang Wang et.al. |
2406.12845 |
link |
| 2024-06-18 |
P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts |
Yuhao Dan et.al. |
2406.12548 |
null |
| 2024-06-18 |
Variational Distillation of Diffusion Policies into Mixture of Experts |
Hongyi Zhou et.al. |
2406.12538 |
null |
| 2024-06-18 |
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory |
Haoze Wu et.al. |
2406.12375 |
link |
| 2024-06-17 |
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding |
Ukyo Honda et.al. |
2406.12060 |
link |
| 2024-06-17 |
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence |
DeepSeek-AI et.al. |
2406.11931 |
link |
| 2024-06-17 |
Graph Knowledge Distillation to Mixture of Experts |
Pavel Rumiantsev et.al. |
2406.11919 |
link |
| 2024-06-17 |
$\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts |
Guanjie Chen et.al. |
2406.11353 |
link |
| 2024-06-17 |
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts |
Tong Zhu et.al. |
2406.11256 |
link |
| 2024-06-14 |
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion |
Anke Tang et.al. |
2406.09770 |
link |
| 2024-06-13 |
DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts |
Joel Ong et.al. |
2406.08742 |
link |
| 2024-06-12 |
Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark |
Pingzhi Li et.al. |
2406.08155 |
link |
| 2024-06-11 |
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters |
Yixin Song et.al. |
2406.05955 |
null |
| 2024-06-08 |
Flexible and Adaptable Summarization via Expertise Separation |
Xiuying Chen et.al. |
2406.05360 |
link |
| 2024-06-07 |
MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter |
Jitai Hao et.al. |
2406.04984 |
link |
| 2024-06-07 |
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks |
Xingkui Zhu et.al. |
2406.04801 |
link |
| 2024-06-05 |
Style Mixture of Experts for Expressive Text-To-Speech Synthesis |
Ahad Jawaid et.al. |
2406.03637 |
null |
| 2024-06-05 |
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach |
Haoyu Han et.al. |
2406.03464 |
null |
| 2024-06-05 |
Continual Traffic Forecasting via Mixture of Experts |
Sanghyun Lee et.al. |
2406.03140 |
null |
| 2024-06-05 |
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models |
Raeid Saqur et.al. |
2406.02969 |
null |
| 2024-06-04 |
Parrot: Multilingual Visual Instruction Tuning |
Hai-Long Sun et.al. |
2406.02539 |
link |
| 2024-06-04 |
Demystifying the Compression of Mixture-of-Experts Through a Unified Framework |
Shwai He et.al. |
2406.02500 |
link |
| 2024-06-02 |
Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model |
Clement Etienam et.al. |
2406.00889 |
link |
| 2024-06-01 |
A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers |
Daniel Waxman et.al. |
2406.00570 |
link |
| 2024-06-01 |
Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks |
Jiacheng Wang et.al. |
2406.00408 |
null |
| 2024-05-30 |
Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach |
Reza Arabpour et.al. |
2405.20094 |
null |
| 2024-06-02 |
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors |
Renzhi Wang et.al. |
2405.19086 |
null |
| 2024-06-02 |
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design |
Markus J. Buehler et.al. |
2405.19076 |
link |
| 2024-05-29 |
Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization |
Shengcai Liu et.al. |
2405.18884 |
link |
| 2024-05-29 |
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models |
Taehyun Kim et.al. |
2405.18832 |
null |
| 2024-05-29 |
Yuan 2.0-M32: Mixture of Experts with Attention Router |
Shaohua Wu et.al. |
2405.17976 |
link |
| 2024-05-28 |
LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design |
Rui Kong et.al. |
2405.17741 |
null |
| 2024-05-27 |
Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node |
Andreas Charalampopoulos et.al. |
2405.16836 |
link |
| 2024-05-26 |
Mixture of Experts Using Tensor Products |
Zhan Su et.al. |
2405.16671 |
link |
| 2024-05-30 |
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts |
Mohammed Nowaz Rabbani Chowdhury et.al. |
2405.16646 |
null |
| 2024-05-26 |
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation |
Rongyu Zhang et.al. |
2405.16486 |
link |
| 2024-05-25 |
MoEUT: Mixture-of-Experts Universal Transformers |
Róbert Csordás et.al. |
2405.16039 |
link |
| 2024-05-23 |
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training |
Xianzhi Du et.al. |
2405.15052 |
link |
| 2024-05-23 |
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast |
Chufan Shi et.al. |
2405.14507 |
link |
| 2024-05-23 |
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models |
Yongxin Guo et.al. |
2405.14297 |
link |
| 2024-05-23 |
Graph Sparsification via Mixture of Graphs |
Guibin Zhang et.al. |
2405.14260 |
link |
| 2024-05-23 |
Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts |
Huy Nguyen et.al. |
2405.14131 |
null |
| 2024-05-23 |
Mixture of Experts Meets Prompt-Based Continual Learning |
Minh Le et.al. |
2405.14124 |
link |
| 2024-05-22 |
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts |
Huy Nguyen et.al. |
2405.13997 |
null |
| 2024-05-22 |
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token |
Xin Cheng et.al. |
2405.13792 |
link |
| 2024-05-24 |
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models |
Jingwei Xu et.al. |
2405.13053 |
link |
| 2024-05-21 |
Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts |
Ruichen Zhang et.al. |
2405.12472 |
null |
| 2024-05-21 |
Ensemble and Mixture-of-Experts DeepONets For Operator Learning |
Ramansh Sharma et.al. |
2405.11907 |
link |
| 2024-05-19 |
Learning More Generalized Experts by Merging Experts in Mixture-of-Experts |
Sejik Park et.al. |
2405.11530 |
null |
| 2024-05-18 |
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts |
Yunxin Li et.al. |
2405.11273 |
link |
| 2024-05-16 |
Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts |
Ruolin Su et.al. |
2405.09744 |
null |
| 2024-05-15 |
M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts |
Yufeng Jiang et.al. |
2405.09446 |
link |
| 2024-05-13 |
Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition |
Zhiyong Yang et.al. |
2405.07780 |
link |
| 2024-05-07 |
SUTRA: Scalable Multilingual Language Model Architecture |
Abhijit Bendale et.al. |
2405.06694 |
null |
| 2024-05-09 |
A Mixture of Experts Approach to 3D Human Motion Prediction |
Edmund Shieh et.al. |
2405.06088 |
link |
| 2024-05-09 |
A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds |
Christopher Z. Cui et.al. |
2405.06059 |
null |
| 2024-05-09 |
EWMoE: An effective model for global weather forecasting with mixture-of-experts |
Lihao Gan et.al. |
2405.06004 |
link |
| 2024-05-09 |
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts |
Jiachen Li et.al. |
2405.05949 |
link |
| 2024-05-16 |
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model |
DeepSeek-AI et.al. |
2405.04434 |
link |
| 2024-05-07 |
Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts |
Changyuan Zhao et.al. |
2405.04198 |
null |
| 2024-05-06 |
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training |
Zexuan Zhong et.al. |
2405.03133 |
null |
| 2024-05-06 |
WDMoE: Wireless Distributed Large Language Models with Mixture of Experts |
Nan Xue et.al. |
2405.03131 |
null |
| 2024-05-31 |
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models |
Xudong Lu et.al. |
2402.14800 |
null |
| 2024-10-29 |
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts |
Shirley Wu et.al. |
2312.04693 |
null |
| 2021-05-25 |
Tensor-variate Mixture of Experts for Proportional Myographic Control of a Robotic Hand |
Noémie Jaquier et.al. |
1902.11104 |
null |
| 2018-06-22 |
Mixtures of Experts Models |
Isobel Claire Gormley et.al. |
1806.08200 |
null |