Skip to content

Commit 8cc325a

Browse files
authored
upd (#201)
1 parent b5c7cb6 commit 8cc325a

File tree

1 file changed

+0
-3
lines changed

1 file changed

+0
-3
lines changed

blog/2025-09-21-petit-amdgpu.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@ author: "Haohui Mai, Lei Zhang"
44
date: "September 21, 2025"
55
previewImg: /images/blog/petit/petit-facade.png
66
---
7-
8-
Haohui Mai (CausalFlow.ai), Lei Zhang (AMD)
9-
107
## Introduction
118

129
As frontier large language models (LLMs) continue scaling to unprecedented sizes, they demand increasingly more compute power and memory bandwidth from GPUs. Both GPU manufacturers and model developers are shifting toward low-precision floating-point formats. FP4 (4-bit floating point) quantization has emerged as a particularly compelling solution—for instance, FP4-quantized [Llama 3.3 70B](https://huggingface.co/nvidia/Llama-3.3-70B-Instruct-FP4) models achieve a 3.5x reduction in model size while maintaining minimal quality degradation on benchmarks like [MMLU](https://arxiv.org/abs/2009.03300).

0 commit comments

Comments
 (0)