Commit 35c0f42
committed
Detect TRL trainer class: route DPO/ORPO, mark GRPO cross-cuts
Add detect_trainer_class() which scans code cells for TRL trainer
class names (GRPOTrainer, DPOTrainer, ORPOTrainer, KTOTrainer,
RewardTrainer, PPOTrainer) and returns the highest-priority match.
This is the authoritative signal for what kind of RL/preference
training a notebook runs, regardless of the filename.
Three behavior changes driven by the trainer class:
1. DPO / ORPO routing to the GRPO section
Zephyr (7B) DPO and Llama3 (8B) ORPO were landing in the Mistral
and Llama sections because they follow the filename-based
architecture routing. They are now force-routed to the GRPO &
Reinforcement Learning section with Type set to "DPO" / "ORPO"
respectively.
2. (GRPO RL) suffix on cross-cut sections
A notebook that trains with GRPOTrainer may also appear in
cross-cutting sections (Vision (Multimodal), Embedding, OCR). In
those non-GRPO sections the Type now gets a "(GRPO RL)" suffix so
readers can tell it is an RL notebook and not a plain SFT one.
Examples in the Vision (Multimodal) section:
* Qwen2.5 VL (7B) Vision Math + vLLM (GRPO RL)
* Qwen3 VL (8B) Vision Math (GRPO RL)
* Qwen3.5 (4B) Vision Math (GRPO RL)
* Gemma3 (4B) Vision Math (GRPO RL)
3. Row rendering moved into the per-section loop
Previously the row string was built once and appended to every
section the notebook belonged to. The row is now built inside the
`for section_name in data["sections"]` loop so the Type column can
vary per section (to append "(GRPO RL)" only in cross-cut sections).
The vLLM suffix is now only appended for GRPO-class training so the
new DPO / ORPO rows stay clean.1 parent aa0845b commit 35c0f42
2 files changed
+114
-33
lines changed
0 commit comments