fix: correct CSV extraction in scaling_laws.sh#580
Open
geopti wants to merge 2 commits intokarpathy:masterfrom
Open
fix: correct CSV extraction in scaling_laws.sh#580geopti wants to merge 2 commits intokarpathy:masterfrom
geopti wants to merge 2 commits intokarpathy:masterfrom
Conversation
Two bugs caused all parameter columns and tokens_trained to be silently
empty/wrong in the results CSV:
1. Parameter grep patterns did not account for the padded key format.
base_train.py prints parameters as `{key:24s}: {value:,}`, e.g.
`wte : 33,554,432`, so patterns like `grep "wte:"`
never matched. Fixed by using `grep -P "wte\s+:"` to handle the spaces.
2. tokens_trained was hardcoded as `NUM_ITERS * 524288`, but the batch
size is auto-computed by base_train.py and may differ from 524288
depending on the FLOPs budget and model size. Fixed by extracting the
actual value from the log line "Total number of training tokens: X".
…g batch size Same bug as scaling_laws.sh: TOKENS_TRAINED was computed as NUM_ITERS * 524288, hardcoding the default total batch size. When base_train auto-computes a different batch size, the value is wrong. Fix by reading "Total number of training tokens:" directly from the training log.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug fixes
Two bugs in
runs/scaling_laws.shcause the results CSV to be silently wrong.Bug 1 — parameter columns always empty
base_train.pyprints parameter counts with 24-character key padding:The grep patterns used
"wte:","value_embeds:"etc., which never match because there are spaces between the key and the colon. Some parameter columns in the CSV are silently empty.Fix: use
grep -P "wte\s+:"to handle the padding.Bug 2 — tokens_trained always wrong
tokens_trainedwas computed asNUM_ITERS * 524288, hardcoding the default batch size. Butbase_train.pyauto-computes the optimal batch size based on the FLOPs budget and model size — it may differ from 524288, especially for small models at small FLOPs budgets.Fix: extract
tokens_traineddirectly from the log line"Total number of training tokens: X".