Skip to content

Conversation

@PhaneeshB
Copy link
Contributor

  • Add dummy layers test for most llama layers
  • Add real weights test for rms_norm + lm_head layers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets try to focus in on a reusable component rather than a monlithic end-to-end-runner. We want a thing that takes a torch.Module and generates an equivalent module that uses iree as a backend. This thing should do the following:

  • Export the MLIR functions from each
  • Compile the exported mlir
  • Load the module into a runtime with device setup
  • Return the invokeable module

We don't want to include executing it as once exported it could be invoked multiple times to test.

As a simple V0, lets assume a single forward function. Export the module. And reload into an iree runtime.

Copy link
Contributor Author

@PhaneeshB PhaneeshB Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first step in this direction, I've pushed changes with refactor of the existing method to:

  1. Export and capture eager outputs
  2. Compile
  3. Run with IREE to generate output
  4. compare eager vs iree outputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand as the requirement for this layers based testing,
I've divided each unit (layer) test to do just that, comparing with torch vs IREE for accuracy ( which consequently validates export, compilation and vmfb execution, in a passing test)

We don't want to include executing it as once exported it could be invoked multiple times to test.

So when you say multiple invocations, what changes between those invocations?
Trying to understand if it's about using the same vmfb with different input args ( in a single test)
or you have some other usecase in mind

since this is a layer-wise test we are working with random inputs,
my thinking has been to add a simple-to-setup test that can do a torch-vs-iree run for a given module
including export.

return module.forward(*fn_args)

export_output = export(fxb, import_symbolic_shape_expressions=True)
export_output.save_mlir(mlir_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't be necessary to save the IR to a file. Could this be a flag to enable devs to save the IR to a path for debugging but just directly use the exported version otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current usecase we are saving this in a tmp dir
from a stand alone method pov, we just expect it to export the IR along with outputs


Args:
module: torch.nn.Module under test
args: example positional inputs (tuple required)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_args would make this more clear

@pytest.mark.parametrize("dtype,atol", [(torch.float32, 1e-4), (torch.float16, 1e-4)])
def test_linear_iree_vs_eager(dtype, atol):
torch.manual_seed(0)
m = Linear(64, 64, bias=False, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be testing this linear against our sharktank linear layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that might not be what we are testing here
for layers based tests we are comparing torch vs iree execution for different sharktank llm model layers

The goal for the layer based test suite is to have layers that correspond to actual llm model layers across variants (for llama) that we export in sharktank.

In order to get started, we added tests that have a close but dummy implementation ( same structure, smaller size, random weights)
current linear layer test is one of these dummy tests.

in the next steps we want to update this to create linear layer using an irpa file
when adding that test we shall keep this as a mock test to facilitate running the test without an irpa file.

)

# Output linear layer (language model head)
self.output_lm_head = LinearLayer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this linear layer is a separate sharktank "layer". I would just stick to the above RMSNormLayer for this test and leave this linearlayer to the linear test above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup this is a composite test since RMS Norm + Linear is the last step in getting the logits output
This serves as a starting point for adding other tests

return logits


def create_output_lm_head_from_irpa(irpa_path: str) -> tuple[OutputLMHead, torch.Tensor]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't need to depend on an irpa file. We can create a theta like we are here and use that for testing

import pytest
from sharktank.utils._helpers import run_iree_vs_torch_fx

class RMSNorm(torch.nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test feels unnecessary if we test RMSNormLayer above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be used as a mock test,
in the next steps this will also contain an rms norm layer test created using the parameters which is currently a part of output_lm_test_with_iree

@PhaneeshB PhaneeshB force-pushed the layers_test branch 8 times, most recently from 5a5c892 to 66e3e61 Compare September 30, 2025 09:11
@github-actions
Copy link
Contributor

github-actions bot commented Sep 30, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  sharktank
  conftest.py
  sharktank/sharktank/utils
  _iree_compile_flags_config.py
  iree.py 771-775, 799-833, 860-884, 903-911, 942-982
  testing.py 785-798
  sharktank/tests/layers
  ffn_with_iree_test.py 15-19, 22-25, 43-46
  linear_with_iree_test.py 17-18, 21, 39-42
  output_lm_head_with_iree_test.py 24-34, 41-46, 62-88, 104-117, 136-188
  rms_norm_with_iree_test.py 17-20, 24-27, 33-37
  rotary_embedding_hf_test.py 428, 433, 464-472, 528-534
  rotary_embedding_test.py 183-209, 227-233, 244-250
  sharded_conv2d_with_iree_test.py
  token_embedding_with_iree_test.py 21-23, 26, 32-40, 53-59, 69-70
  sharktank/tests/models/vae
  vae_test.py
Project Total  

This report was generated by python-coverage-comment-action

@PhaneeshB PhaneeshB changed the title [WIP] Layers test LLM Layer testcases Sep 30, 2025
@PhaneeshB PhaneeshB marked this pull request as ready for review September 30, 2025 13:41
@PhaneeshB PhaneeshB force-pushed the layers_test branch 9 times, most recently from 92a0135 to 4f0b012 Compare October 3, 2025 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants