[mlir-gen] Add mlir builders for llama3.1 and tests #13

kurapov-peter · 2025-11-14T13:57:14Z

Putting up this dirty draft for early feedback/questions. I'm putting together some tests to run a e2e llama3.1 going through linalg on tensors. The goal is to generate some nice linalg that would be optimization friendly. At the moment, there are just functional blocks and pieces that are just smoke-tested. These include naive implementations for rotary embeddings, feed forward, rms, and a bunch of other small snippets that are useful to implement the model. These are already enough to put an attention block together. It'd be nice to test it against the original implementation, but that'd require fairscale as a dependency. For now I only added pytest and kept the pipeline as simple as possible. I also reused the example with the schedule, so now it is a part of every test.

rengolin · 2025-11-14T14:02:34Z

Should this be in examples?

kurapov-peter · 2025-11-14T14:11:35Z

The e2e should be, yup, but this is mostly tests and getters.

python/examples/llama/test_llama3.py

kurapov-peter · 2025-11-24T16:29:28Z

I moved the whole thing to examples and added attention the list of tests.

rengolin

Thanks!

rolfmorel

Nice! Have left some comments inline.

rolfmorel · 2025-11-25T15:57:49Z

.github/workflows/examples.yml

+
+      - name: Run pytest-enabled examples as tests
+        run: |
+          uv run pytest python/examples


Could we instead integrate with lit? That is, would it work if we just added the line # RUN: %PYTHON pytest %s at the top of test_llama3.py?

There's value in trying to preserve being able to just lit $PATH_WITHIN_PROJECT to run the respective tests (including PATH_WITHIN_PROJECT=. in the root).

rolfmorel · 2025-11-25T16:32:32Z

python/examples/llama/test_llama3.py

+        (get_matmul, (16, 16), "f32"),
+        (get_outer, (16,), "f32"),
+    ],
+)


My suggestion would be to define a decorator, something like:

def with_context_and_unknown_location(f: Callable): def wrapped(*args, **kwargs): with ir.Context(), ir.Location.unknown(): f(*args, **kwargs) return wrapped

Just annotate all your pytest tests with that and completely forget about needing to deal with context and location (and entering context managers) throughout the rest of the code. That is, most APIs will pick up the current context and location automatically. For the APIs that don't there's the mlir.ir.Context.current and mlir.ir.Location.current escape hatch.

Some context: #20 (comment) and #20 (comment)

rolfmorel · 2025-11-25T16:35:53Z

python/examples/llama/test_llama3.py

+def create_pass_pipeline(ctx: ir.Context) -> PassManager:
+    with ctx:
+        pm = PassManager("builtin.module")


Suggested change

def create_pass_pipeline(ctx: ir.Context) -> PassManager:

with ctx:

pm = PassManager("builtin.module")

def create_pass_pipeline() -> PassManager:

pm = PassManager("builtin.module")

See comment below for how to elide dealing with contexts in most places.

The above holds for many functions in this file.

For this particular function, it should just become the suffix of the schedule, so we just have end-to-end schedules for the entire MLIR lowering that is happening.

See https://github.com/libxsmm/tpp-mlir/blob/37a498bd1e320e00fa50e3323cbaac2867cd7a1e/python/mlir/tpp/sched/bundles.py#L41-L43 for an example for dealing with passes that expect to run on particular ops w.r.t to the root module.

rolfmorel · 2025-11-25T16:37:54Z

python/examples/llama/test_llama3.py

+
+        # Create entry point transformation sequence.
+        with ir.InsertionPoint(schedule.body):
+            named_seq = transform.NamedSequenceOp(


Suggested change

named_seq = transform.NamedSequenceOp(

named_seq = transform.named_sequence(

#10 (review):

As a general principle we can now set for ourselves for the project right at the start: in case the upstream bindings already have a workable snake_case version of an op, lets use that over the CamelCaseOp version. The crux of the argument being that this will make the Python code look closer to a terse version of the MLIR textual format.

rolfmorel · 2025-11-25T16:45:46Z

python/examples/llama/test_llama3.py

+        [xq_scores_map, keys_scores_map, scores_map],
+        [parallel, parallel, parallel, parallel, reduction],
+    )
+    def compute_scores(q_val, k_val, score_val):


Could be written as a linalg.contract, right?

rolfmorel · 2025-11-25T16:50:04Z

python/examples/llama/test_llama3.py

+        [parallel] * 4,
+    )
+    def scale_scores(score, _out):
+        return arith.MulFOp(score, scale_const).result


Suggested change

return arith.MulFOp(score, scale_const).result

return arith.mulf(score, scale_const)

If you use arith.mulf (or whatever the snake_case version is called), you should be able to elide the .result.

This holds generally (for single-result ops, that is).

rolfmorel · 2025-11-25T17:06:04Z

python/examples/llama/test_llama3.py

+                anytype, mod, "convert-linalg-to-loops"
+            )
+            # Cleanup.
+            transform.ApplyCommonSubexpressionEliminationOp(mod)


I believe transform.apply_cse is the snake_case version, though I might be wrong.

rolfmorel · 2025-11-25T17:07:00Z

python/examples/llama/test_llama3.py

+    module = generate_module(ctx, ir_type)
+    bufferize_module(ctx, module)
+    schedule = create_schedule(ctx)
+    apply_schedule(module, schedule)
+    pm = create_pass_pipeline(ctx)
+    pm.run(module.operation)


Suggested change

module = generate_module(ctx, ir_type)

bufferize_module(ctx, module)

schedule = create_schedule(ctx)

apply_schedule(module, schedule)

pm = create_pass_pipeline(ctx)

pm.run(module.operation)

module = generate_module(ctx, ir_type)

schedule = create_schedule(ctx)

apply_schedule(module, schedule)

Just move the passes from inside bufferize_module(ctx, module) and create_pass_pipeline(ctx) into the start and end of the schedule, i.e. with transform.apply_registered_pass.

I know this antipattern originates in an example script we merged, but we should not let this proliferate. It clearly is already confusing people.

rolfmorel · 2025-11-25T17:13:09Z

python/examples/llama/test_llama3.py

+    return schedule
+
+
+def apply_schedule(kernel: ir.Module, schedule: ir.Module) -> None:
+    interpreter.apply_named_sequence(
+        payload_root=kernel,
+        transform_root=schedule.body.operations[0],
+        transform_module=schedule,
+    )


Suggested change

return schedule

def apply_schedule(kernel: ir.Module, schedule: ir.Module) -> None:

interpreter.apply_named_sequence(

payload_root=kernel,

transform_root=schedule.body.operations[0],

transform_module=schedule,

)

return named_seq

If we do this, you can simply do:

schedule = create_schedule() schedule.apply(module)

If you need access to the Module around the named_sequence, just ask for it's .parent.

rolfmorel · 2025-11-25T17:21:04Z

python/lighthouse/utils/runtime_args.py

@@ -1,2 +1,2 @@
 import ctypes
 import torch


I know this PR didn't introduce it, though looking at it now, I feel we should think about compartmentalizing code that depends on heavy dependencies a bit more. That is, not have it in the same module with code that doesn't have the dependency, e.g. get_packed_arg.

rengolin requested review from adam-smnk, banach-space and rolfmorel November 14, 2025 14:02

Garra1980 reviewed Nov 18, 2025

View reviewed changes

python/examples/llama/test_llama3.py Show resolved Hide resolved

[examples] Add a llama3 example.

891ec0f

kurapov-peter force-pushed the rms branch from f7db6c4 to 891ec0f Compare November 24, 2025 16:21

fixup! [examples] Add a llama3 example.

0ec7c3a

rengolin approved these changes Nov 24, 2025

View reviewed changes

rolfmorel reviewed Nov 25, 2025

View reviewed changes

	named_seq = transform.NamedSequenceOp(
	named_seq = transform.named_sequence(

	return arith.MulFOp(score, scale_const).result
	return arith.mulf(score, scale_const)

[mlir-gen] Add mlir builders for llama3.1 and tests #13

Are you sure you want to change the base?

[mlir-gen] Add mlir builders for llama3.1 and tests #13

Conversation

kurapov-peter commented Nov 14, 2025

Uh oh!

rengolin commented Nov 14, 2025

Uh oh!

kurapov-peter commented Nov 14, 2025

Uh oh!

Uh oh!

kurapov-peter commented Nov 24, 2025

Uh oh!

rengolin left a comment

Choose a reason for hiding this comment

Uh oh!

rolfmorel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolfmorel Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rolfmorel Nov 25, 2025 •

edited

Loading