Skip to content

Commit d5a16a1

Browse files
authored
Merge pull request #1082 from PopSoda2002/doc/vlm
[FSDP][vlm] Add B200 doc
2 parents c728d82 + d770b66 commit d5a16a1

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

examples/geo3k_vlm/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,7 @@ All three performed similarly, so we use the default math RM for simplicity.
2929

3030
Our initial geo3k-specific verifier produced "format scores" (**0 and 0.9**) instead of clean binary rewards. Under **fp32**, fractional values like 0.9 can't be exactly represented, so when all samples in a group have the same reward, `reward - mean` doesn't equal zero—creating spurious gradient signal.
3131

32-
We fixed this by switching to the default math RM with clean **binary 0/1 rewards**. If you encounter similar precision issues with non-binary rewards, you can change the reward tensor dtype from `torch.float` to `torch.float16` in `slime/ray/rollout.py` (`_post_process_rewards` method) to truncate precision artifacts.
32+
We fixed this by switching to the default math RM with clean **binary 0/1 rewards**. If you encounter similar precision issues with non-binary rewards, you can change the reward tensor dtype from `torch.float` to `torch.float16` in `slime/ray/rollout.py` (`_post_process_rewards` method) to truncate precision artifacts.
33+
34+
## B200
35+
Blackwell currently does not support fa3, we need to use `--sglang-mm-attention-backend sdpa` and `--attn-implementation flash_attention_2`

0 commit comments

Comments
 (0)