Changed tests from logits to topk logprobs #745

Manan17 · 2025-06-05T00:29:07Z

Summary

Just testing out logprobs as mentioned in #742
It worked for the models where the test using logits was not working.
Also, tried to setup 1e-1 tolerance for qwen (previously 1) and it passed.

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

test/convergence/bf16/test_mini_models.py

Tcc0403 · 2025-06-05T01:45:48Z

test/convergence/bf16/test_mini_models.py

+            1e-1,  # 1e-1
            1e-1,  # 1e-2


After removing all logprobs comparison, we can try setting it lower.
sglang only has atol and sets it to 5e-2 (decode_tolerance)
verl sets (atol, rtol) = (1e-2, 1e-5), but it's mean of all logprobs not topk

Does not work with lower tolerance.
For gemma3, it passes when atol=1e-1 and rtol=1

I tested this out with fp32, it fails for most of the models where old logic for checking the logits is passing.

Since we are comparing values in log-space, the total tolerance here is actually relative tolerance.

Can we just check the rtol?
like: tolerance = rtol * torch.abs(tensor2)

absolute diff for two logprobs (logA - logB) = relative diff for two probs (A / B), which means the whole tolerance (atol + rtol * torch.abs(expected)) should be the maximum relative diff we can accept.

I think that's also why sglang only has a single tolerance in their test.

test/convergence/bf16/test_mini_models.py

Manan17 · 2025-06-08T23:53:25Z

@Tcc0403 Can you have a look at the changes, I have tested it.
Let me know what you think, I will update it for the multimodal tests as well.
What should be done for test_mini_models_with_logits?

Tcc0403 · 2025-06-09T12:40:56Z

What should be done for test_mini_models_with_logits

check logprobs as well for consistency

I'm planning to rewrite convergence tests so just ignore namings for now.

test/utils.py

Manan17 · 2025-06-09T19:02:26Z

What should be done for test_mini_models_with_logits

check logprobs as well for consistency

I'm planning to rewrite convergence tests so just ignore namings for now.

Gotcha!
I tried testing with mean logprobs as well.
The tests pass with lower tolerance values. Verl has set atol=1e-2 and rtol=1e-5, which works for us as well in bf16.

Tcc0403 · 2025-06-09T20:20:54Z

I tried testing with mean logprobs as well.
The tests pass with lower tolerance values. Verl has set atol=1e-2 and rtol=1e-5, which works for us as well in bf16.

What mean logprobs do you pick? I checked verl impl, they pick per-token logprobs for the given labels

test/utils.py

Manan17 · 2025-06-10T20:40:11Z

I tried top 20 logprobs and it was able to pass tests for all the models! @Tcc0403

…to logprobs

Manan17 · 2025-06-11T18:45:44Z

The tolerance for gemma3 multimodal model had to be set high as it does not pass the tests for loss and topk_logprobs.
The atol and rtol set it 1e-1.

Tcc0403 · 2025-06-11T21:34:43Z

The tolerance for gemma3 multimodal model had to be set high as it does not pass the tests for loss and topk_logprobs.
The atol and rtol set it 1e-1.

Yeah, I think we can compromise with 1e-1 before further investigation in numerical issue. Just make them all green first unless there's an obvious mismatch.

…to logprobs

shimizust

Thanks for making these changes!

Manan17 added 2 commits June 3, 2025 21:07

Working with logprob for testing

370cd34

Checking top k prologs

465465b

Tcc0403 reviewed Jun 5, 2025

View reviewed changes

test/convergence/bf16/test_mini_models.py Show resolved Hide resolved

Tcc0403 reviewed Jun 5, 2025

View reviewed changes

test/convergence/bf16/test_mini_models.py Outdated Show resolved Hide resolved

Tcc0403 reviewed Jun 5, 2025

View reviewed changes

created a util function

12003ea

Tcc0403 reviewed Jun 7, 2025

View reviewed changes

test/convergence/bf16/test_mini_models.py Outdated Show resolved Hide resolved

Tcc0403 reviewed Jun 7, 2025

View reviewed changes

test/convergence/bf16/test_mini_models.py Outdated Show resolved Hide resolved

Manan17 and others added 2 commits June 6, 2025 20:29

Merge branch 'main' into logprobs

242bf55

fixup

c661eba

Tcc0403 reviewed Jun 9, 2025

View reviewed changes

test/utils.py Show resolved Hide resolved

adding logprobs to all test files

6cededa

Tcc0403 reviewed Jun 10, 2025

View reviewed changes

test/utils.py Outdated Show resolved Hide resolved

Manan17 added 2 commits June 10, 2025 18:33

Merge branch 'main' into logprobs

e736cb0

resolved issue

e2ca150

Manan17 and others added 7 commits June 11, 2025 17:52

Merge branch 'main' into logprobs

ebee524

Merge branch 'linkedin:main' into logprobs

a24ad91

Merge branch 'logprobs' of https://github.com/Manan17/Liger-Kernel in…

14e40f2

…to logprobs

change loss tolerance and logprob tolerance for gemma3

4f4a9e0

Update benchmark.yml

8d73d79

Update benchmark.yml

2c8ac3f

Update benchmark.yml

ba6f637

adjust tolerance for gemma3_text

1105b56

Manan17 and others added 3 commits June 11, 2025 22:25

Merge branch 'logprobs' of https://github.com/Manan17/Liger-Kernel in…

752a665

…to logprobs

Merge branch 'main' into logprobs

821c730

fixed checkstyle

0e0c75d

shimizust approved these changes Jun 13, 2025

View reviewed changes

shimizust merged commit 1f640a5 into linkedin:main Jun 13, 2025
3 of 7 checks passed

Manan17 changed the title ~~Trying out logprobs and top logprobs for testing rather than logits.~~ Changed tests from logits to topk logprobs Jul 9, 2025

Changed tests from logits to topk logprobs #745

Changed tests from logits to topk logprobs #745

Uh oh!

Conversation

Manan17 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

Uh oh!

Uh oh!

Tcc0403 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Manan17 Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Manan17 Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

Manan17 Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Manan17 commented Jun 8, 2025

Uh oh!

Tcc0403 commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Manan17 commented Jun 9, 2025

Uh oh!

Tcc0403 commented Jun 9, 2025

Uh oh!

Uh oh!

Manan17 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manan17 commented Jun 11, 2025

Uh oh!

Tcc0403 commented Jun 11, 2025

Uh oh!

shimizust left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Manan17 commented Jun 5, 2025 •

edited

Loading

Manan17 Jun 5, 2025 •

edited

Loading

Tcc0403 Jun 9, 2025 •

edited

Loading

Tcc0403 commented Jun 9, 2025 •

edited

Loading

Manan17 commented Jun 10, 2025 •

edited

Loading