Skip to content

Add missing data subcommand for super3#103

Open
mvanhorn wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
mvanhorn:osc/100-add-super3-data-subcommand
Open

Add missing data subcommand for super3#103
mvanhorn wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
mvanhorn:osc/100-add-super3-data-subcommand

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

Fixes #100

The super3 command group imports a data package (_typer_group.py line 26) that was not checked in with the super3 training recipes release. This adds the complete data/ package mirroring the existing nano3/data/ structure:

  • data/import_/ - Import pretrain, SFT, and RL data as W&B artifacts
  • data/prep/ - Data preparation commands for pretrain, SFT, and RL stages

All files follow the nano3 data package pattern exactly, with paths updated to reference super3 recipe data_prep.py scripts.

Also fixes the .gitignore negation for the nano3 data directory (was !src/nemotron/cli/nano3/data/, corrected to !src/nemotron/cli/commands/nano3/data/) and adds the super3 negation. This gitignore issue is likely why the data package was not included in the original super3 release.

Changes

  • 12 new files in src/nemotron/cli/commands/super3/data/ (mirroring nano3 pattern)
  • 1 modified file .gitignore (fix nano3 negation path, add super3 negation)

Test plan

  • Verify nemotron super3 data --help shows prep and import subcommands
  • Verify nemotron super3 data prep --help shows pretrain, sft, rl subcommands
  • Verify nemotron super3 data import --help shows pretrain, sft, rl subcommands
  • ruff check and ruff format --check pass

This contribution was developed with AI assistance (Claude Code).

The super3 command group imports a `data` package that was not checked in
(referenced in _typer_group.py line 26). This adds the complete data package
mirroring the nano3 data package structure with import and prep subcommands,
pointing to the super3 recipe data_prep scripts.

Also fixes the .gitignore negation path for nano3/data (was missing
`commands/` directory) and adds super3/data negation.

Fixes NVIDIA-NeMo#100

Signed-off-by: Matt Van Horn <matt@mattvanhorn.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data subcommand missing for super3

2 participants