Skip to content

wall-clock time / GPU-hours to reproduce (GRPO on Qwen2.5-3B/7B) #9

@mrT333

Description

@mrT333

Hi, thanks for releasing the code and paper!

Could you share a ballpark training duration so we can better budget the costs that would be necessary to reproduce those results?

From the paper I see you used 4 × A100 (80 GB), and Graph-R1 was trained with GRPO for 3 epochs, batch size 128, max length 4096 (per Appendix G). Also for step 4 in the README of this repo:

“Run GRPO/REINFORCE++/PPO training with Qwen2.5-3B-Instruct (Need 4 × 48 GB GPUs)”

roughly how long should we expect on 4 × 48 GB GPUs (e.g., A40s on Runpod) to reach your reported results on e.g. HotpotQA? Is this closer to a few hours, tens of GPU-hours, or hundreds of GPU-hours?

If possible, any of the following would help a ton:

  • Wall-clock time per epoch (and total) on your 4 × A100 80 GB setup for Qwen2.5-3B and 7B
  • Throughput you observed (tokens/sec or samples/sec), plus grad-accum and precision settings
  • Approx # of training steps actually run (did you early-stop before 3 epochs or go beyond?)
  • Any dataset-specific runtime notes for HotpotQA or any other dataset (e.g., typical effective sequence lengths)

Totally fine to answer at order-of-magnitude level, just trying to size the experiment budget. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions