FastLM

All

12 repositories

CSV-Decode
Public
CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference
Python
•0•12•0•0•Updated Feb 26, 2026Feb 26, 2026
CXL-SpecKV
Public
[FPGA'26 Best Paper Nominee] CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving
C++
•1•18•0•0•Updated Feb 23, 2026Feb 23, 2026
tinyserve-vllm
Public
[ACM MM 2025 Oral] TinyServe: Query-Aware Page Allocation Optimization
Shell
•2•10•0•0•Updated Jan 18, 2026Jan 18, 2026
SPI_VecDB
Public
Distributed Parallel Multi-Resolution Vector Search
Go
•
Apache License 2.0
•0•9•0•0•Updated Jan 16, 2026Jan 16, 2026
HSGM
Public
[ICPADS 2025 Oral, *SEM 2025 Oral] HSGM: Hierarchical Segment-Graph Memory for Scalable Long-Text Semantics
Python
•
MIT License
•0•8•0•0•Updated Nov 23, 2025Nov 23, 2025
CogLoad
Public
Cognitive Load Traces
Python
•0•1•0•0•Updated Nov 3, 2025Nov 3, 2025
NeuroSpec
Public
Grammar- and Resource-Aligned Certifiable Speculative Decoding
Python
•0•0•0•0•Updated Oct 31, 2025Oct 31, 2025
PiKV
Public
PiKV: KV Cache Management System for MoE [Efficient ML System]
Python
•
Other
•7•4•0•0•Updated Oct 26, 2025Oct 26, 2025
GraphSnapShot
Public
GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]
Python
•6•2•0•0•Updated Sep 22, 2025Sep 22, 2025
FastCache
Public
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]
Python
•
Apache License 2.0
•312•7•0•0•Updated Sep 22, 2025Sep 22, 2025
SemToken
Public
[IWCS 2025 Oral] SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Modeling
Python
•0•5•0•0•Updated Sep 21, 2025Sep 21, 2025
QTM
Public
https://www.arxiv.org/abs/2508.13204
Python
•3•0•0•0•Updated Sep 21, 2025Sep 21, 2025