Skip to content

Nemo task5 6 hooks and progress reporting#9

Open
yayajjiang wants to merge 3 commits intorlops:nemofrom
yayajjiang:nemo-task5-6-hooks-and-progress-reporting
Open

Nemo task5 6 hooks and progress reporting#9
yayajjiang wants to merge 3 commits intorlops:nemofrom
yayajjiang:nemo-task5-6-hooks-and-progress-reporting

Conversation

@yayajjiang
Copy link
Copy Markdown
Collaborator

No description provided.

yayajjiang and others added 3 commits April 19, 2026 09:58
… reporting

Task 5 (F11-flag, F5-hooks):
- Add RLixHooks Protocol with 7-method interface (before/after_generation,
  before/after_training, before_weight_sync, begin/end_progress_batch)
- Add NoOpRLixHooks default implementation for standalone NeMo RL runs
- Add NemoRLRLixHooks with GPU hook placeholders (Task 7 TODOs) and
  Task 3 NCCL placeholder in before_weight_sync
- Add grpo.py stub with DO_TIME_SHARING flag and 5 hook insertion points
  in the correct per-step order

Task 6 (F9):
- Implement begin_progress_batch / end_progress_batch state machine
- 2% bucket granularity (50 buckets), deduplicated fire-and-forget emit
- _emit_progress() isolated as overridable method for testability
- _emit_progress body is TODO pending Task 7 scheduler actor wiring

Tests: 30 unit tests, all passing, no GPU/Ray/vLLM required

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tep_target_estimate bootstrap

- grpo.py: begin_progress_batch now called before before_generation each step
- TASK5_6_HOOKS.md: add chicken-and-egg problem explanation and two-mechanism solution
- test: add test_begin_progress_batch_called_before_before_generation (31 tests total)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant