Merge pull request #1082 from PopSoda2002/doc/vlm

PopSoda2002 · web-flow · commit d5a16a1643ca · 2025-12-10T20:31:06.000-08:00
[FSDP][vlm] Add B200 doc
diff --git a/examples/geo3k_vlm/README.md b/examples/geo3k_vlm/README.md
@@ -29,4 +29,7 @@ All three performed similarly, so we use the default math RM for simplicity.
 
 Our initial geo3k-specific verifier produced "format scores" (**0 and 0.9**) instead of clean binary rewards. Under **fp32**, fractional values like 0.9 can't be exactly represented, so when all samples in a group have the same reward, `reward - mean` doesn't equal zero—creating spurious gradient signal.
 
-We fixed this by switching to the default math RM with clean **binary 0/1 rewards**. If you encounter similar precision issues with non-binary rewards, you can change the reward tensor dtype from `torch.float` to `torch.float16` in `slime/ray/rollout.py` (`_post_process_rewards` method) to truncate precision artifacts.
+We fixed this by switching to the default math RM with clean **binary 0/1 rewards**. If you encounter similar precision issues with non-binary rewards, you can change the reward tensor dtype from `torch.float` to `torch.float16` in `slime/ray/rollout.py` (`_post_process_rewards` method) to truncate precision artifacts.
+
+## B200
+Blackwell currently does not support fa3, we need to use  `--sglang-mm-attention-backend sdpa` and `--attn-implementation flash_attention_2`