Skip to content

Conversation

@Jackmin801
Copy link
Member

Add integration tests that validate benchmark metrics (TPS, step time, MFU, peak memory) against baselines for A6000 GPU configurations.

Tests include:

  • test_benchmark_no_regression: Runs benchmarks and checks for regression
  • test_baseline_exists: Validates baseline files exist and are well-formed

The tests use a 5% regression threshold (consistent with CI workflow) and cover all 6 A6000 benchmark configurations:

  • Qwen3-0.6B RL Full (16384, 65536 seq_len)
  • Qwen3-0.6B RL LoRA r=16 (16384, 65536 seq_len)
  • Qwen3-0.6B SFT Full (8192 seq_len)
  • Qwen3-4B-Instruct-2507 RL LoRA r=16 (16384 seq_len)

Add integration tests that validate benchmark metrics (TPS, step time, MFU,
peak memory) against baselines for A6000 GPU configurations.

Tests include:
- test_benchmark_no_regression: Runs benchmarks and checks for regression
- test_baseline_exists: Validates baseline files exist and are well-formed

The tests use a 5% regression threshold (consistent with CI workflow) and
cover all 6 A6000 benchmark configurations:
- Qwen3-0.6B RL Full (16384, 65536 seq_len)
- Qwen3-0.6B RL LoRA r=16 (16384, 65536 seq_len)
- Qwen3-0.6B SFT Full (8192 seq_len)
- Qwen3-4B-Instruct-2507 RL LoRA r=16 (16384 seq_len)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants