-
Notifications
You must be signed in to change notification settings - Fork 70
LLM Layer testcases #2289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
LLM Layer testcases #2289
Conversation
PhaneeshB
commented
Sep 19, 2025
- Add dummy layers test for most llama layers
- Add real weights test for rms_norm + lm_head layers
87364df to
1e7bf0a
Compare
1e7bf0a to
0305097
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets try to focus in on a reusable component rather than a monlithic end-to-end-runner. We want a thing that takes a torch.Module and generates an equivalent module that uses iree as a backend. This thing should do the following:
- Export the MLIR functions from each
- Compile the exported mlir
- Load the module into a runtime with device setup
- Return the invokeable module
We don't want to include executing it as once exported it could be invoked multiple times to test.
As a simple V0, lets assume a single forward function. Export the module. And reload into an iree runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first step in this direction, I've pushed changes with refactor of the existing method to:
- Export and capture eager outputs
- Compile
- Run with IREE to generate output
- compare eager vs iree outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand as the requirement for this layers based testing,
I've divided each unit (layer) test to do just that, comparing with torch vs IREE for accuracy ( which consequently validates export, compilation and vmfb execution, in a passing test)
We don't want to include executing it as once exported it could be invoked multiple times to test.
So when you say multiple invocations, what changes between those invocations?
Trying to understand if it's about using the same vmfb with different input args ( in a single test)
or you have some other usecase in mind
since this is a layer-wise test we are working with random inputs,
my thinking has been to add a simple-to-setup test that can do a torch-vs-iree run for a given module
including export.
7093fca to
207c3c2
Compare
| return module.forward(*fn_args) | ||
|
|
||
| export_output = export(fxb, import_symbolic_shape_expressions=True) | ||
| export_output.save_mlir(mlir_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it shouldn't be necessary to save the IR to a file. Could this be a flag to enable devs to save the IR to a path for debugging but just directly use the exported version otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the current usecase we are saving this in a tmp dir
from a stand alone method pov, we just expect it to export the IR along with outputs
|
|
||
| Args: | ||
| module: torch.nn.Module under test | ||
| args: example positional inputs (tuple required) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_args would make this more clear
| @pytest.mark.parametrize("dtype,atol", [(torch.float32, 1e-4), (torch.float16, 1e-4)]) | ||
| def test_linear_iree_vs_eager(dtype, atol): | ||
| torch.manual_seed(0) | ||
| m = Linear(64, 64, bias=False, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be testing this linear against our sharktank linear layer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that might not be what we are testing here
for layers based tests we are comparing torch vs iree execution for different sharktank llm model layers
The goal for the layer based test suite is to have layers that correspond to actual llm model layers across variants (for llama) that we export in sharktank.
In order to get started, we added tests that have a close but dummy implementation ( same structure, smaller size, random weights)
current linear layer test is one of these dummy tests.
in the next steps we want to update this to create linear layer using an irpa file
when adding that test we shall keep this as a mock test to facilitate running the test without an irpa file.
| ) | ||
|
|
||
| # Output linear layer (language model head) | ||
| self.output_lm_head = LinearLayer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this linear layer is a separate sharktank "layer". I would just stick to the above RMSNormLayer for this test and leave this linearlayer to the linear test above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup this is a composite test since RMS Norm + Linear is the last step in getting the logits output
This serves as a starting point for adding other tests
| return logits | ||
|
|
||
|
|
||
| def create_output_lm_head_from_irpa(irpa_path: str) -> tuple[OutputLMHead, torch.Tensor]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't need to depend on an irpa file. We can create a theta like we are here and use that for testing
| import pytest | ||
| from sharktank.utils._helpers import run_iree_vs_torch_fx | ||
|
|
||
| class RMSNorm(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test feels unnecessary if we test RMSNormLayer above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be used as a mock test,
in the next steps this will also contain an rms norm layer test created using the parameters which is currently a part of output_lm_test_with_iree
5a5c892 to
66e3e61
Compare
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
42d84b2 to
a590c65
Compare
92a0135 to
4f0b012
Compare
…uites into torch eager/iree with coarse granularity.
41b76cd to
2ee8ba4
Compare