Skip to content

Commit faba994

Browse files
committed
Adding little graphic to fp8 execution
1 parent 03a5d2e commit faba994

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

page.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,13 @@ Starting with NVIDIA H100 GPU, GPUs have *hardware support* for 8 bit floating p
7272

7373
When we talk about `fp8` models, we typically only are talking about the **weights being `fp8`**. The actual execution of the model is still done in `bf16`. So all the **intermediate tensors are still in `bf16`**, and it's the underlying CUDA kernels that are taking in `bf16` tensors and `fp8` weights.
7474

75+
```
76+
`fp8` weight
77+
\
78+
v
79+
`bf16` input -> Linear -> `bf16` output
80+
```
81+
7582
**fp8 models still use `bf16` kv cache by default** (since the kv cache stores kv values, which are intermediate tensors).
7683

7784
### fp8 bit format

0 commit comments

Comments
 (0)