Skip to content

Commit 76d1d1f

Browse files
committed
docs: fix vespo training example
1 parent f0c2490 commit 76d1d1f

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

docs/source/paper_index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -592,7 +592,8 @@ from trl import GRPOConfig
592592

593593
training_args = GRPOConfig(
594594
loss_type="vespo",
595-
importance_sampling_level="token",
595+
use_vllm=True, # or False if not using any token-level `vllm_importance_sampling_correction` methods
596+
vllm_importance_sampling_mode="token_truncate", # default correction mode for VESPO, `token_mask` also supported
596597
vespo_k_pos=2.0, # Power exponent (c1 in paper Section 3.4) for positive advantages
597598
vespo_lambda_pos=3.0, # Decay factor (c2 in paper Section 3.4) for positive advantages
598599
vespo_k_neg=3.0, # Power exponent (c1 in paper Section 3.4) for negative advantages

0 commit comments

Comments
 (0)