You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -101,7 +101,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
101
101
|Date|Title|Paper|Code|Recom|
102
102
|:---:|:---:|:---:|:---:|:---:|
103
103
|2022.06|🔥[**ZeroQuant**] Efficient and Affordable Post-Training Quantization for Large-Scale Transformers(@Microsoft) |[[pdf]](https://arxiv.org/pdf/2206.01861.pdf)|[[DeepSpeed]](https://github.com/microsoft/DeepSpeed)|⭐️⭐️ |
104
-
|2022.08|[FP8-Quantization] FP8 Quantization: The Power of the Exponent(@Qualcomm AI Research) |[[pdf]](https://arxiv.org/pdf/2208.09225.pdf)|⚠️|⭐️ |
104
+
|2022.08|[FP8-Quantization] FP8 Quantization: The Power of the Exponent(@Qualcomm AI Research) |[[pdf]](https://arxiv.org/pdf/2208.09225.pdf)|[[FP8-quantization]](https://github.com/Qualcomm-AI-research/FP8-quantization)|⭐️ |
105
105
|2022.08|[LLM.int8()] 8-bit Matrix Multiplication for Transformers at Scale(@Facebook AI Research etc) |[[pdf]](https://arxiv.org/pdf/2208.07339.pdf)|[[bitsandbytes]](https://github.com/timdettmers/bitsandbytes)|⭐️ |
106
106
|2022.10|🔥[**GPTQ**] GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS(@IST Austria etc) |[[pdf]](https://arxiv.org/pdf/2210.17323.pdf)|[[gptq]](https://github.com/IST-DASLab/gptq)|⭐️⭐️ |
107
107
|2022.11|🔥[**WINT8/4**] Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production(@NVIDIA&Microsoft) |[[pdf]](https://arxiv.org/pdf/2211.10017.pdf)|[[FasterTransformer]](https://github.com/NVIDIA/FasterTransformer)|⭐️⭐️ |
0 commit comments