Skip to content

Open a High-Score (HS) Track #106

@Triang-jyed-driung

Description

@Triang-jyed-driung

Opening High-Score (HS) Track

Rationale: The NanoGPT speedrun has been effective in optimizing training speed, but at the expense of code readability. For instance, without a comprehensive understanding of float precision, one might struggle to comprehend how the following code operates:

    acc_m_u32 = (acc_bf16_view_u16.to(torch.uint32) << 16) | mantissa.to(torch.uint32)
    acc_m_u32.view(torch.float32).mul_(1 - eff_weight_decay)
    acc_m_u32.view(torch.float32).add_(other=v, alpha=-eff_lr)
    acc_bf16_view_u16.copy_((acc_m_u32 >> 16).to(torch.uint16))
    mantissa.copy_(acc_m_u32.to(torch.uint16))

It is even more unclear why this implementation is faster than a direct approach.

I propose opening a High-Score (HS) track aimed at balancing legibility and efficiency. This is my draft:

  1. Models must be trained on a predefined number x of tokens (e.g., 2 billion). These tokens must appear in the same sequence during training. Early exiting, skipping data, or using any piece of data more than once is prohibited.
  2. The total number of active parameters must not exceed y million (yM).
  3. The total training time must not exceed z minutes. The value of z should be slightly higher than the typical runtime of a standard training run (without NanoGPT-specific optimizations). Runs that exceed the time limit without processing all tokens will be disqualified.
  4. Evaluate the model on w predefined downstream NLP benchmarks. The score will be calculated as the average accuracy across these benchmarks.
  5. (Optional) Penalize 0.001% of the score for every valid line of code. This can serve as a Kolmogorov complexity penalty term, encouraging concise and efficient implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions