Skip to content

Commit 7247770

Browse files
authored
🔥[Inf-MLLM] Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU (#63)
1 parent efb983b commit 7247770

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
5757
* 📖[CPU/Single GPU/FPGA/Mobile Inference](#CPU-Single-GPU-Inference)
5858
* 📖[Non Transformer Architecture](#Non-Transformer-Architecture)🔥
5959
* 📖[GEMM/Tensor Cores/WMMA/Parallel](#GEMM-Tensor-Cores-WMMA)
60-
* 📖[Position Embed/Others](#Others)
60+
* 📖[VLM/Position Embed/Others](#Others)
6161

6262
### 📖Trending LLM/VLM Topics ([©️back👆🏻](#paperlist))
6363
<div id="Trending-LLM-VLM-Topics"></div>
@@ -402,13 +402,14 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
402402
|2024.08|🔥🔥[**SpMM**] High Performance Unstructured SpMM Computation Using Tensor Cores(@ETH Zurich)|[[pdf]](https://arxiv.org/pdf/2408.11551)|⚠️|⭐️ |
403403
|2024.09| 🔥[**TEE**]Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study(@phala.network)|[[pdf]](https://arxiv.org/pdf/2409.03992)|⚠️|⭐️ |
404404

405-
### 📖Position Embed/Others ([©️back👆🏻](#paperlist))
405+
### 📖VLM/Position Embed/Others ([©️back👆🏻](#paperlist))
406406
<div id="Others"></div>
407407

408408
|Date|Title|Paper|Code|Recom|
409409
|:---:|:---:|:---:|:---:|:---:|
410410
|2021.04|🔥[RoPE] ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING(@Zhuiyi Technology Co., Ltd.) |[[pdf]](https://arxiv.org/pdf/2104.09864.pdf)|[[transformers]](https://huggingface.co/docs/transformers/model_doc/roformer) ![](https://img.shields.io/github/stars/huggingface/transformers.svg?style=social)|⭐️ |
411-
|2022.10|[ByteTransformer] A High-Performance Transformer Boosted for Variable-Length Inputs(@ByteDance&NVIDIA)|[[pdf]](https://arxiv.org/pdf/2210.03052.pdf)|[[ByteTransformer]](https://github.com/bytedance/ByteTransformer) ![](https://img.shields.io/github/stars/bytedance/ByteTransformer.svg?style=social)|⭐️ |
411+
|2022.10|[ByteTransformer] A High-Performance Transformer Boosted for Variable-Length Inputs(@ByteDance&NVIDIA)|[[pdf]](https://arxiv.org/pdf/2210.03052.pdf)|[[ByteTransformer]](https://github.com/bytedance/ByteTransformer) ![](https://img.shields.io/github/stars/bytedance/ByteTransformer.svg?style=social)|⭐️ |
412+
|2024.09|🔥[**Inf-MLLM**] Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU(@sjtu)|[[pdf]](https://arxiv.org/pdf/2409.09086)|⚠️|⭐️ |
412413

413414
## ©️License
414415

0 commit comments

Comments
 (0)