LLM Layer testcases #2289

PhaneeshB · 2025-09-19T14:27:25Z

Add dummy layers test for most llama layers
Add real weights test for rms_norm + lm_head layers

.github/workflows/ci_eval_llama_3.1_8b_fp16.yml

rsuderman · 2025-09-22T18:03:52Z

sharktank/sharktank/utils/_helpers.py

Lets try to focus in on a reusable component rather than a monlithic end-to-end-runner. We want a thing that takes a torch.Module and generates an equivalent module that uses iree as a backend. This thing should do the following:

Export the MLIR functions from each

Compile the exported mlir

Load the module into a runtime with device setup

Return the invokeable module

We don't want to include executing it as once exported it could be invoked multiple times to test.

As a simple V0, lets assume a single forward function. Export the module. And reload into an iree runtime.

As a first step in this direction, I've pushed changes with refactor of the existing method to:

Export and capture eager outputs

Compile

Run with IREE to generate output

compare eager vs iree outputs

From what I understand as the requirement for this layers based testing,
I've divided each unit (layer) test to do just that, comparing with torch vs IREE for accuracy ( which consequently validates export, compilation and vmfb execution, in a passing test)

We don't want to include executing it as once exported it could be invoked multiple times to test.

So when you say multiple invocations, what changes between those invocations?
Trying to understand if it's about using the same vmfb with different input args ( in a single test)
or you have some other usecase in mind

since this is a layer-wise test we are working with random inputs,
my thinking has been to add a simple-to-setup test that can do a torch-vs-iree run for a given module
including export.

sharktank/sharktank/utils/_helpers.py

IanNod · 2025-09-24T16:57:19Z

sharktank/sharktank/utils/_helpers.py

+        return module.forward(*fn_args)
+
+    export_output = export(fxb, import_symbolic_shape_expressions=True)
+    export_output.save_mlir(mlir_path)


it shouldn't be necessary to save the IR to a file. Could this be a flag to enable devs to save the IR to a path for debugging but just directly use the exported version otherwise?

For the current usecase we are saving this in a tmp dir
from a stand alone method pov, we just expect it to export the IR along with outputs

IanNod · 2025-09-24T17:03:33Z

sharktank/sharktank/utils/_helpers.py

+
+    Args:
+        module: torch.nn.Module under test
+        args: example positional inputs (tuple required)


input_args would make this more clear

IanNod · 2025-09-24T17:27:30Z

amdsharktank/tests/layers/linear_with_iree_test.py

+@pytest.mark.parametrize("dtype,atol", [(torch.float32, 1e-4), (torch.float16, 1e-4)])
+def test_linear_iree_vs_eager(dtype, atol):
+    torch.manual_seed(0)
+    m = Linear(64, 64, bias=False, dtype=dtype)


We should be testing this linear against our sharktank linear layer

that might not be what we are testing here
for layers based tests we are comparing torch vs iree execution for different sharktank llm model layers

The goal for the layer based test suite is to have layers that correspond to actual llm model layers across variants (for llama) that we export in sharktank.

In order to get started, we added tests that have a close but dummy implementation ( same structure, smaller size, random weights)
current linear layer test is one of these dummy tests.

in the next steps we want to update this to create linear layer using an irpa file
when adding that test we shall keep this as a mock test to facilitate running the test without an irpa file.

IanNod · 2025-09-24T17:32:49Z

sharktank/tests/layers/output_lm_test_with_iree.py

+        )
+
+        # Output linear layer (language model head)
+        self.output_lm_head = LinearLayer(


this linear layer is a separate sharktank "layer". I would just stick to the above RMSNormLayer for this test and leave this linearlayer to the linear test above

yup this is a composite test since RMS Norm + Linear is the last step in getting the logits output
This serves as a starting point for adding other tests

IanNod · 2025-09-24T17:36:54Z

sharktank/tests/layers/output_lm_test_with_iree.py

+        return logits
+
+
+def create_output_lm_head_from_irpa(irpa_path: str) -> tuple[OutputLMHead, torch.Tensor]:


we shouldn't need to depend on an irpa file. We can create a theta like we are here and use that for testing

IanNod · 2025-09-24T17:39:36Z

amdsharktank/tests/layers/rms_norm_with_iree_test.py

+import pytest
+from sharktank.utils._helpers import run_iree_vs_torch_fx
+
+class RMSNorm(torch.nn.Module):


This test feels unnecessary if we test RMSNormLayer above

it will be used as a mock test,
in the next steps this will also contain an rms norm layer test created using the parameters which is currently a part of output_lm_test_with_iree

amdsharktank/tests/layers/token_embedding_with_iree_test.py

github-actions · 2025-09-30T09:20:06Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
sharktank
conftest.py
sharktank/sharktank/utils
_iree_compile_flags_config.py
iree.py					771-775, 799-833, 860-884, 903-911, 942-982
testing.py					785-798
sharktank/tests/layers
ffn_with_iree_test.py					15-19, 22-25, 43-46
linear_with_iree_test.py					17-18, 21, 39-42
output_lm_head_with_iree_test.py					24-34, 41-46, 62-88, 104-117, 136-188
rms_norm_with_iree_test.py					17-20, 24-27, 33-37
rotary_embedding_hf_test.py					428, 433, 464-472, 528-534
rotary_embedding_test.py					183-209, 227-233, 244-250
sharded_conv2d_with_iree_test.py
token_embedding_with_iree_test.py					21-23, 26, 32-40, 53-59, 69-70
sharktank/tests/models/vae
vae_test.py
Project Total

_{This report was generated by python-coverage-comment-action}

…uites into torch eager/iree with coarse granularity.

…fails.

github-advanced-security bot found potential problems Sep 19, 2025

View reviewed changes

.github/workflows/ci_eval_llama_3.1_8b_fp16.yml Fixed Show fixed Hide fixed

PhaneeshB force-pushed the layers_test branch from 87364df to 1e7bf0a Compare September 19, 2025 14:37

PhaneeshB requested review from pdhirajkumarprasad and rsuderman September 19, 2025 14:38

pdhirajkumarprasad requested a review from IanNod September 19, 2025 19:12

PhaneeshB force-pushed the layers_test branch from 1e7bf0a to 0305097 Compare September 22, 2025 08:15

rsuderman reviewed Sep 22, 2025

View reviewed changes

PhaneeshB force-pushed the layers_test branch from 7093fca to 207c3c2 Compare September 24, 2025 15:56

IanNod reviewed Sep 24, 2025

View reviewed changes

PhaneeshB force-pushed the layers_test branch 8 times, most recently from 5a5c892 to 66e3e61 Compare September 30, 2025 09:11

PhaneeshB changed the title ~~[WIP] Layers test~~ LLM Layer testcases Sep 30, 2025

PhaneeshB force-pushed the layers_test branch from 42d84b2 to a590c65 Compare September 30, 2025 13:40

PhaneeshB marked this pull request as ready for review September 30, 2025 13:41

PhaneeshB force-pushed the layers_test branch 9 times, most recently from 92a0135 to 4f0b012 Compare October 3, 2025 07:30

PhaneeshB and others added 28 commits November 25, 2025 12:09

move flags to config file and use in all test

91a1b1c

fix pre-commit

2a2d253

layers only run for hip devices

8ec0a61

add ci for layerstest

0aa8684

xfail token emb mock test

fbd2834

refactor helpers

acd9d59

xfail test with issues

05bc4cc

Update xfails, add garbage collection to some tests, and split test s…

2433066

…uites into torch eager/iree with coarse granularity.

Protect iree device arrays from escaping their device's lifetime.

cc39cd8

Xfail lm head test with numerics issue.

d927971

Fix parameter overwriting issue with conv2d test and update lm_head x…

ec7c786

…fails.

Pass in temporary directory path for parameter write.

1556450

Add waits in fragile rotary tests, more garbage collection control

a7eb4a5

Fixups to garbage collection/device array destruction

6030be4

Fixup rotary embedding test utils usage

0ee3c7e

Use sharded parameters path on conv2d layer module load.

ad92d1a

Clone torch tensors before exiting device context.

2164d0b

Move garbage collection outside of device contexts

4f0b26e

Require HIP device for rotary IREE test.

2ff6786

Update with_iree_device_context docs.

9cde8f9

Revert changes to linear quant test.

3c1cb39

Preserve iree_to_torch prior functionality.

d6bebd8

Simplify conv2d param path args.

6590ded

small iree_to_torch rework and fixes

49cafeb

Rename default func name, small fixes

aff894c

refactor ref and export

038a000

shark migration update

6f8baef

lint fixes

2ee8ba4

PhaneeshB force-pushed the layers_test branch from 41b76cd to 2ee8ba4 Compare November 25, 2025 12:12

PhaneeshB requested a review from IanNod November 25, 2025 15:40

		return logits


		def create_output_lm_head_from_irpa(irpa_path: str) -> tuple[OutputLMHead, torch.Tensor]:

LLM Layer testcases #2289

Are you sure you want to change the base?

LLM Layer testcases #2289

Uh oh!

Conversation

PhaneeshB commented Sep 19, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PhaneeshB Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

PhaneeshB Sep 24, 2025 •

edited

Loading

github-actions bot commented Sep 30, 2025 •

edited

Loading