[sharktank] toy model for gpt-oss #2516

oyazdanb · 2025-10-15T15:14:52Z

adding toy model and e2e testing for gpt-oss

github-actions · 2025-10-15T15:23:13Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
sharktank/sharktank/layers
mixture_of_experts_block.py
paged_llama_attention_block.py
sharktank/sharktank/models/gpt_oss
orig_pytorch_model.py					90-116, 328
testing.py					184, 224
toy_gpt_oss.py					31, 122-134, 220-280, 284-297, 301
sharktank/sharktank/models/llama
testing.py					50-59
sharktank/sharktank/models/llm
llm.py
sharktank/sharktank/ops
attention_impls.py
default_impls.py
sharktank/tests/models/gpt_oss
toy_gpt_oss_test.py					40-56, 223
Project Total

_{This report was generated by python-coverage-comment-action}

sharktank/sharktank/models/llama/testing.py

Alex-Vasile · 2025-10-15T16:03:12Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        result2 = decoder.prefill_cross_entropy([self.sequence])[0]
+        self.assertEqual(result.score, result2.score)


What is this doing?

I believe this is testing determinism between 2 prefills with same inputs, which once the model bring up is fairly stable shouldn't be a concern. So can be removed.

Alex-Vasile · 2025-10-15T16:03:24Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        assert result.valid
+
+        shark_ce = 4.6970133781433105
+        torch.testing.assert_close(result.score, shark_ce, atol=1e-2, rtol=1e-2)


How come the tolerances are so large?

FWIW, that's the tolerance used for toy llama too, cross entropy is the concern here. Added a comment above to resolve it.

Alex-Vasile · 2025-10-15T16:18:54Z

sharktank/sharktank/models/gpt_oss/toy_gpt_oss.py

+    if ref_model is None:
+        return


If ref_model is None we should be erroring out not passing silently.

Alex-Vasile · 2025-10-15T16:19:51Z

sharktank/sharktank/models/gpt_oss/toy_gpt_oss.py

+)
+
+
+def copy_weights_to_reference(shark_theta, ref_model, hp):


This does not belong in the toy model file.

Alex-Vasile · 2025-10-15T16:20:04Z

sharktank/sharktank/models/gpt_oss/toy_gpt_oss.py

+def calculate_cross_entropy_manual(
+    model_instance, sequence: list[int], use_prefill: bool = True
+) -> tuple[float, float]:
+    """Calculate cross entropy and perplexity manually for debugging."""
+    evaluator = model_instance.make_perplexity_eval()
+    if use_prefill:
+        res = evaluator.prefill_cross_entropy([sequence])[0]
+    else:
+        res = evaluator.decode_cross_entropy([sequence])[0]
+
+    assert res.valid
+    ce = res.score
+    ppl = float(torch.exp(torch.tensor(ce)))
+
+    print("cross_entropy_nats:", ce)
+    print("perplexity:", ppl)
+    return ce, ppl


This does not belong in here.

Alex-Vasile · 2025-10-15T16:21:57Z

sharktank/sharktank/models/gpt_oss/testing.py

+def make_simple_analytical_gpt_oss_theta(
+    config: LlamaModelConfig,
+    vocab_size: Optional[int] = None,
+    dtype_rest: torch.dtype = torch.bfloat16,
+    dtype_norm: torch.dtype = torch.bfloat16,
+) -> Theta:
+    """Generate a GPT-OSS theta with simple analytical weights for hand calculation."""
+    return make_random_gpt_oss_theta(
+        config=config,
+        vocab_size=vocab_size,
+        dtype_rest=dtype_rest,
+        dtype_norm=dtype_norm,
+        weight_generator=make_simple_calculable_weight_torch,
+    )


This function is not needed. All it does it wrap make_random_gpt_oss_theta and pass one extra argument.

archana-ramalingam · 2025-10-21T21:45:04Z

sharktank/sharktank/models/gpt_oss/orig_pytorch_model.py

Can rename to ref_pytorch_model.py and if only used in testing, can be moved to tests/models/gpt_oss/

archana-ramalingam · 2025-10-21T21:57:35Z

sharktank/sharktank/models/gpt_oss/testing.py

+)
+
+
+def make_gpt_oss_attention_block_theta(


Ideally, we want to define theta layers in sharktank/layers/testing.py and construct the blocks accordingly under the respective /sharktank/models/<model_name>.
Also, try reusing existing make theta functions (like attn, moe) wherever possible and add additional layers like bias, as you construct it here.

The vision models are not following this pattern and needs to be consolidated to align with llms.

archana-ramalingam · 2025-10-21T22:00:18Z

sharktank/sharktank/models/gpt_oss/orig_pytorch_model.py

@@ -0,0 +1,386 @@
+import json


Add the following to all the new files:

# Copyright 2025 Advanced Micro Devices, Inc. # # Licensed under the Apache License v2.0 with LLVM Exceptions. # See https://llvm.org/LICENSE.txt for license information. # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

archana-ramalingam · 2025-10-23T03:43:15Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+            block_count=block_count,
+        )
+        decoder = instance.make_decoder()
+        generated_tokens = decoder.greedy_decode([[0]], steps=14)[0]


It's necessary to link the input here directly, by replacing [[0]] by self.sequence[0]

archana-ramalingam · 2025-10-23T03:44:23Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        decoded = decoder.greedy_decode([[0]], steps=len(expected))[0]
+        decoded2 = decoder.greedy_decode([[0]], steps=len(expected))[0]


Same comment as above about linking input.

archana-ramalingam · 2025-10-23T03:49:02Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        decoded = decoder.greedy_decode([[0]], steps=len(expected))[0]
+        decoded2 = decoder.greedy_decode([[0]], steps=len(expected))[0]
+
+        self.assertEqual(decoded, decoded2)


Determinism is verified by the assert below. This assert does almost the same thing, except run the test one more time. So can be removed.

archana-ramalingam · 2025-10-23T05:03:47Z

sharktank/sharktank/models/gpt_oss/toy_gpt_oss.py

+    return theta, config
+
+
+def generate_analytical(


Is it possible to consolidate both generate functions by having a default LlamaHParams and for analytical, only necessary args are changed?

archana-ramalingam · 2025-10-23T05:33:25Z

sharktank/sharktank/models/gpt_oss/toy_gpt_oss.py

+    ref_model.unembedding.weight.data = shark_theta("output", "weight").as_torch()
+
+
+def calculate_cross_entropy_manual(


This essentially does what we have in the test below. I don't see a need to have this separately.

archana-ramalingam · 2025-10-23T05:39:33Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        self.seed = 12345
+
+        # Hardcoded for CI performance - regenerate with self.generate_sequence() if weights change
+        self.sequence = [0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]


Can we generate this input from the toy irpa model, so that the cross entropy isn't very high?
Start to prompt with 0
Run again with prefill or a subsequent decoder step:
[0] -> [2]
[0 2] -> [5]
[0 2 5] -> [9]
[0 2 5 9] -> [4]
...

archana-ramalingam · 2025-10-23T05:41:35Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        result2 = decoder.decode_cross_entropy([self.sequence])[0]
+        self.assertEqual(result.score, result2.score)


Can remove all asserts verifying determinism.

archana-ramalingam · 2025-10-23T05:42:18Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+    """Test reference and sharktank model e2e comparison."""
+
+    def setUp(self):
+        logging.basicConfig(level=logging.INFO)


Shouldn't this be set globally for this whole test?

Also, can replace print statements here to use logging.

archana-ramalingam · 2025-10-23T05:47:22Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        print(f"Full test sequence: {full_sequence}")
+        return full_sequence
+
+    def testDecodeSequence(self):


On that note, not sure if we need this test at all. Can stick with cross entropy for prefill/decode for eager/IREE modes.

archana-ramalingam · 2025-10-23T05:50:46Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+                count += 1
+
+            ref_ce = total_loss / count
+            ref_ppl = float(torch.exp(torch.tensor(ref_ce)))


We don't seem to be verifying ref_ppl across both tests?

archana-ramalingam · 2025-10-23T05:54:22Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        torch.testing.assert_close(
+            shark_result.score, expected_ce, atol=1e-2, rtol=1e-2
+        )
+        torch.testing.assert_close(ref_ce, expected_ce, atol=1e-2, rtol=1e-2)


Shouldn't we be comparing ref_ce with shark_result.score?

archana-ramalingam · 2025-10-23T05:55:01Z

sharktank/tests/models/gpt_oss/toy_gpt_oss_test.py

+        torch.testing.assert_close(
+            shark_result.score, expected_ce, atol=1e-2, rtol=1e-2
+        )
+        torch.testing.assert_close(ref_ce, expected_ce, atol=1e-2, rtol=1e-2)


Same as above, compare ref_ce with shark_result.score?

archana-ramalingam · 2025-10-23T06:04:20Z

sharktank/sharktank/models/gpt_oss/toy_gpt_oss.py

+    ref_model.embedding.weight.data = shark_theta("token_embd", "weight").as_torch()
+
+    # Copy transformer blocks
+    for block_idx in range(hp.block_count):


Mapping from safetensors to irpa or vice versa is usually more readable, if we use a mapping dict like here. Is this something we can leverage here too?

adding toy model and e2e testing for gpt-oss

84a3271

oyazdanb requested review from Alex-Vasile, archana-ramalingam, rsuderman and sa-faizal October 15, 2025 15:15

oyazdanb marked this pull request as ready for review October 15, 2025 15:15

Alex-Vasile requested changes Oct 15, 2025

View reviewed changes

archana-ramalingam reviewed Oct 21, 2025

View reviewed changes

archana-ramalingam reviewed Oct 23, 2025

View reviewed changes

Merge branch 'main' into users/oyazdanb/gpt_oss_toy

8d1d699

		result2 = decoder.prefill_cross_entropy([self.sequence])[0]
		self.assertEqual(result.score, result2.score)

		decoded = decoder.greedy_decode([[0]], steps=len(expected))[0]
		decoded2 = decoder.greedy_decode([[0]], steps=len(expected))[0]

		ref_model.unembedding.weight.data = shark_theta("output", "weight").as_torch()


		def calculate_cross_entropy_manual(

		result2 = decoder.decode_cross_entropy([self.sequence])[0]
		self.assertEqual(result.score, result2.score)

[sharktank] toy model for gpt-oss #2516

Are you sure you want to change the base?

[sharktank] toy model for gpt-oss #2516

Uh oh!

Conversation

oyazdanb commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

archana-ramalingam Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Oct 15, 2025 •

edited

Loading

archana-ramalingam Oct 21, 2025 •

edited

Loading