Skip to content

Commit 35c0f42

Browse files
committed
Detect TRL trainer class: route DPO/ORPO, mark GRPO cross-cuts
Add detect_trainer_class() which scans code cells for TRL trainer class names (GRPOTrainer, DPOTrainer, ORPOTrainer, KTOTrainer, RewardTrainer, PPOTrainer) and returns the highest-priority match. This is the authoritative signal for what kind of RL/preference training a notebook runs, regardless of the filename. Three behavior changes driven by the trainer class: 1. DPO / ORPO routing to the GRPO section Zephyr (7B) DPO and Llama3 (8B) ORPO were landing in the Mistral and Llama sections because they follow the filename-based architecture routing. They are now force-routed to the GRPO & Reinforcement Learning section with Type set to "DPO" / "ORPO" respectively. 2. (GRPO RL) suffix on cross-cut sections A notebook that trains with GRPOTrainer may also appear in cross-cutting sections (Vision (Multimodal), Embedding, OCR). In those non-GRPO sections the Type now gets a "(GRPO RL)" suffix so readers can tell it is an RL notebook and not a plain SFT one. Examples in the Vision (Multimodal) section: * Qwen2.5 VL (7B) Vision Math + vLLM (GRPO RL) * Qwen3 VL (8B) Vision Math (GRPO RL) * Qwen3.5 (4B) Vision Math (GRPO RL) * Gemma3 (4B) Vision Math (GRPO RL) 3. Row rendering moved into the per-section loop Previously the row string was built once and appended to every section the notebook belonged to. The row is now built inside the `for section_name in data["sections"]` loop so the Type column can vary per section (to append "(GRPO RL)" only in cross-cut sections). The vLLM suffix is now only appended for GRPO-class training so the new DPO / ORPO rows stay clean.
1 parent aa0845b commit 35c0f42

File tree

2 files changed

+114
-33
lines changed

2 files changed

+114
-33
lines changed

0 commit comments

Comments
 (0)