Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,10 +84,9 @@ Leveraging a multi-role distributed architecture with Ray for flexible resource
[RAFT++](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/RAFT_Plus_Plus)
[StarPO](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/agentic_StarPO)

#### Beckend
#### Backend
[DeepSeed](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/deepspeed)
[Megatron](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/megatron)
[LoRA](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/lora)
[Megatron](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/megatron)
[vLLM](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/vllm)
[SGLang](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/sglang)

Expand Down Expand Up @@ -119,7 +118,7 @@ Leveraging a multi-role distributed architecture with Ray for flexible resource
* Inference/Generation supports vLLM, SGLang.
* Training supports DeepSpeed (ZeRO), Megatron-LM 5D parallelism (mcore-adapter, dp/tp/pp/cp/ep), FSDP under implementation.
* Extreme offload/reload capabilities.
* Supports LoRA training.
* Supports [LoRA](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/lora) training.
* Supports FP8 rollout (FP8 inference for LLM as judge, FP8 rollout with BF16 training under development).
* **AutoDeviceMapping:** Supports custom device mapping for different roles, flexibly managing colocated and disaggregated deployments.
* **Observability:** Integrated with SwanLab / WandB / TensorBoard, tracking of performance for each domain and reward type.
Expand Down
2 changes: 1 addition & 1 deletion docs_roll/docs/English/UserGuide/algorithms/GRPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ adv_estimator: "grpo"
ppo_epochs: 1
use_kl_loss: true
kl_loss_coef: 0.001
loss_agg_mode: "seq-mean-token-sum"
loss_agg_mode: "seq-mean-token-mean"

# ppo related
# advantage
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ adv_estimator: "grpo"
ppo_epochs: 1
use_kl_loss: true
kl_loss_coef: 0.001
loss_agg_mode: "seq-mean-token-sum"
loss_agg_mode: "seq-mean-token-mean"

# ppo related
# advantage
Expand Down