Skip to content

Conversation

@MukeshK17
Copy link

Fixes #2187

What does this PR do?

Allows --train.max_steps to be passed via the CLI for pretraining runs.
Previously, this argument caused a validation error in pretrain.py.

The argument is now accepted and emits a warning clarifying that it is intended
for profiling, debugging, or minimal sanity-check runs.

Motivation

Users may want to run a very small number of training steps (e.g. max_steps=1)
to measure memory usage or execution time without committing to full pretraining.

Changes

  • Removed train.max_steps from the unsupported argument list in pretrain.py
  • Added a warning when train.max_steps is provided to clarify intended usage

Tests

  • Core unit tests pass locally
  • HF tokenizer tests requiring gated models fail locally without HF_TOKEN
  • Optional dependency tests (datasets, bitsandbytes, lm_eval) not run locally
  • CI is expected to cover gated and optional test paths

@bhimrazy
Copy link
Collaborator

LGTM 👍
@MukeshK17 , Could you also include a test for this? Maybe in test_cli or wherever it fits best. That would help ensure this behavior stays covered.

@MukeshK17
Copy link
Author

Thanks! I’ll add a CLI-level test to ensure --train.max_steps is accepted and doesn’t raise a validation error. Will update the PR shortly.

@MukeshK17
Copy link
Author

Added a CLI test in test_cli.py to cover --train.max_steps acceptance and warning behavior.
Thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI args not setting currently for max_steps, memory/time code profiling

2 participants