Commit 1c16ed5
authored
Show every distinct GRPO task type in the first N rows (#238)
Previously the GRPO section used a round-robin interleave that split
rows into a vLLM bucket and a non-vLLM bucket and walked each bucket
in turn. That guaranteed uniqueness within each bucket, but not across
the whole section: the top 10 rows were all vLLM (with GSM8K and DAPO
repeating because the vLLM bucket only had four distinct types),
while the first non-vLLM distinct type (Sudoku) did not show up until
row 11. Duplicates were also grouped by bucket instead of ordered by
popularity across the whole section.
Replace _interleave_by_task with _unique_types_first:
1. Sort every GRPO row by (popularity, count_ok, has_vllm) descending
so the tuple ordering is popularity first, then vLLM as a minor
tiebreaker between rows of the same popularity and task type.
2. Walk the sorted list once. The first time a task_type is seen,
that row is the representative of that type; every subsequent row
with the same task_type is a duplicate.
3. Emit all representatives first (already in popularity order from
step 1), then all duplicates (also in popularity order).
Result for the current GRPO pool (11 distinct task types):
Row 1 GSM8K Math + vLLM (Llama3.1 8B)
Row 2 Sudoku (NeMo Gym Sudoku)
Row 3 Multi Environment (NeMo Gym Multi Environment)
Row 4 2048 Game (gpt oss BF16 20B)
Row 5 Minesweeper Game (gpt oss 20B)
Row 6 Auto Kernel Creation (gpt oss 20B)
Row 7 DAPO Math + vLLM (Qwen3 8B FP8)
Row 8 ORPO (Llama3 8B)
Row 9 Wordle + vLLM (Openenv wordle)
Row 10 Vision Math + vLLM (Qwen2.5 VL 7B)
Row 11 DPO (Zephyr 7B)
Row 12..29 duplicates in popularity order
Rows 1-11 now cover every distinct task type in the section, which
was the stated goal. Non-GRPO sections are untouched.1 parent 15899e0 commit 1c16ed5
2 files changed
+72
-65
lines changed
0 commit comments