wall-clock time / GPU-hours to reproduce (GRPO on Qwen2.5-3B/7B)

Hi, thanks for releasing the code and paper!

Could you share a ballpark training duration so we can better budget the costs that would be necessary to reproduce those results?

From the paper I see you used 4 × A100 (80 GB), and Graph-R1 was trained with GRPO for 3 epochs, batch size 128, max length 4096 (per Appendix G). Also for step 4 in the README of this repo:

“Run GRPO/REINFORCE++/PPO training with Qwen2.5-3B-Instruct (Need 4 × 48 GB GPUs)”

roughly how long should we expect on 4 × 48 GB GPUs (e.g., A40s on Runpod) to reach your reported results on e.g. HotpotQA? Is this closer to a few hours, tens of GPU-hours, or hundreds of GPU-hours?

If possible, any of the following would help a ton:

- Wall-clock time per epoch (and total) on your 4 × A100 80 GB setup for Qwen2.5-3B and 7B
- Throughput you observed (tokens/sec or samples/sec), plus grad-accum and precision settings
- Approx # of training steps actually run (did you early-stop before 3 epochs or go beyond?)
- Any dataset-specific runtime notes for HotpotQA or any other dataset (e.g., typical effective sequence lengths)

Totally fine to answer at order-of-magnitude level, just trying to size the experiment budget. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wall-clock time / GPU-hours to reproduce (GRPO on Qwen2.5-3B/7B) #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

wall-clock time / GPU-hours to reproduce (GRPO on Qwen2.5-3B/7B) #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions