[RFC] Compiler-side tuner constraint generation #23521

kuhar · 2026-02-19T16:10:29Z

kuhar
Feb 19, 2026
Maintainer

Background

As of writing, the tuner flow is as follows:

Tuner asks the compiler to produce executable sources annotated with root_op attributes over the linalg ops of interests. At this level, target environment attributes are available, but we don't know the codegen pipeline yet, since the dispatch hasn't been configured.
Tuner then iterates over its dispatch parsers and selects those who understand how to compile a given input. These are responsible for inspecting the target environment and root ops and determining whether it's a contraction / convolution / something else.
The selected dispatch parser then analyzes the root op(s) and decided how to emit SMT constraints that define the search space for this dispatch.
The Z3 SMT solver finds satisfying assignments for these constraints and enumerates all valid sets of knob values. We call these 'dispatch candidates'.
The dispatch parser then translates these knob assignments into compilation_info attributes
The tuner emits a tuning spec for each candidate (transform dialect libraries) that will be later used to configure the dispatch.

The problem with this approach is that the tuner has to know a lot about the IREE compiler and it's compilation pipelines (e.g., LLVMGPUVectorDistribute or LLVMGPUTileAndFuse), including what inputs they can compile, their configuration space, their lowering_config format, etc. This is partially encoded directly as python logic and partially exposed through Python bindings.

High-level Proposal

I propose to move most of the responsibilities of dispatch parsers to the IREE compiler itself and have:

A new pass that inserts iree_codegen.constraints op that encodes pipeline constraints of one or more root ops. There may be multiple constraint ops, which are disjoint per compilation pipeline. This happens before dispatch configuration.
A new pass to emit an SMT module from a iree_codegen.constraints ops.
A new binding that collects the iree_codegen.constaints ops and serializes them to SMTLIB strings. The tuner will use this binding to collect constraint sets. Uses the pass mentioned above.
A new verifier pass that runs after configuration and uses the constraints to check if the selected lowering_config and translation_info satisfy the constraints. This is not strictly needed for the tuner, but it allows us to make sure that the compiler respects its own constraints, which are currently not verified until it's too late (e.g., deep in vector distribution).

We don't have to implement it for all the possible pipelines, just like the tuner doesn't support all the backends and pipelines today.

Implementation

You can find my prototype here:

I checked it over a simple matmul and both VectorDistribute and TileAndFuse produce the same set of candidate tuning specs as before.

`iree_codegen.constraints`

The key design consideration was to decide how to capture the following:

The root op(s) that the constraints apply to. I decided to take these as operands, so that we know what the op will configure after we select knob values.
The dimensionality of the problem. This comes in through tensor.dim operands, so that we can deal with both dynamic and static shapes. Assumptions and static sizes may end up getting constant-folded.
The structure of the lowering_config / translation_info. Whatever constraints we emit, the solver has to ingest and find concrete knob assignments for, and the tuner has to turn back into concrete attributes. I decided to have it structurally match the compilation_info dictionary attribute, with leaf (value) attributes as new knob attributes.

Example 1: Simple linalg.fill op

 %6 = linalg.fill ins(%cst : f32) outs(%5 : tensor<128x512xf32>) -> tensor<128x512xf32>

 iree_codegen.constraints
     target(%6 : tensor<128x512xf32>)
     knobs = {lowering_config = {workgroup = [#iree_codegen.int_knob<"wg_0">,
                                              #iree_codegen.int_knob<"wg_1">]},
              pipeline = #iree_codegen< LLVMGPUTileAndFuse>}
     dims(%c128 : index, %c512 : index) {
 ^bb0(%arg0: !smt.int, %arg1: !smt.int):
   %c0_16 = smt.int.constant 0
   %c128_17 = smt.int.constant 128
   %8 = smt.eq %arg0, %c128_17 : !smt.int
   iree_codegen.smt_assert %8, "dim_0 == 128"
   %c512_18 = smt.int.constant 512
   %9 = smt.eq %arg1, %c512_18 : !smt.int
   iree_codegen.smt_assert %9, "dim_1 == 512"
   %10 = iree_codegen.knob "wg_0" : !smt.int
   %11 = iree_codegen.knob "wg_1" : !smt.int
   %c0_19 = smt.int.constant 0
   %12 = smt.int.mod %arg0, %10
   %13 = smt.eq %12, %c0_19 : !smt.int
   iree_codegen.smt_assert %13, "dim_0 must be divisible by wg_0"
   %c0_20 = smt.int.constant 0
   %14 = smt.int.mod %arg1, %11
   %15 = smt.eq %14, %c0_20 : !smt.int
   iree_codegen.smt_assert %15, "dim_1 must be divisible by wg_1"
 }

Example 2: Realistic `linalg.matmul` op

%7 = linalg.matmul ins(%3, %4 : tensor<128x256xf32>, tensor<256x512xf32>) outs(%6 : tensor<128x512xf32>) -> tensor<128x512xf32>
%c0_5 = arith.constant 0 : index
%c128_6 = arith.constant 128 : index
%c1_7 = arith.constant 1 : index
%c512_8 = arith.constant 512 : index
%c1_9 = arith.constant 1 : index
%c256 = arith.constant 256 : index
iree_codegen.constraints
   target(%7 : tensor<128x512xf32>)
   knobs = {lowering_config = {mma_kind   = #iree_codegen.mma_knob<"mma_m", "mma_n", "mma_k">,
                               reduction  = [0, 0, #iree_codegen.int_knob<"red_2">],
                               subgroup   = [#iree_codegen.int_knob<"sg_0_tcnt">,
                                             #iree_codegen.int_knob<"sg_1_tcnt">, 0],
                               subgroup_basis = {counts  = [#iree_codegen.int_knob<"sg_m_cnt">,
                                                            #iree_codegen.int_knob<"sg_n_cnt">, 1],
                                                 mapping = [0, 1, 2]},
                               workgroup  = [#iree_codegen.int_knob<"wg_0">,
                                             #iree_codegen.int_knob<"wg_1">, 0]},
            pipeline = #iree_codegen< LLVMGPUTileAndFuse>,
            translation_info = {subgroup_size  = #iree_codegen.int_knob<"sg_size">,
                                workgroup_size = [#iree_codegen.int_knob<"wg_x">,
                                                  #iree_codegen.int_knob<"wg_y">,
                                                  #iree_codegen.int_knob<"wg_z">]}}
   dims(%c128_6 : index, %c512_8 : index, %c256 : index) {
^bb0(%arg0: !smt.int, %arg1: !smt.int, %arg2: !smt.int):
 %c0_16 = smt.int.constant 0
 %c128_17 = smt.int.constant 128
 %8 = smt.eq %arg0, %c128_17 : !smt.int
 iree_codegen.smt_assert %8, "dim_0 == 128"
 %c512_18 = smt.int.constant 512
 %9 = smt.eq %arg1, %c512_18 : !smt.int
 iree_codegen.smt_assert %9, "dim_1 == 512"
 %c256_19 = smt.int.constant 256
 %10 = smt.eq %arg2, %c256_19 : !smt.int
 iree_codegen.smt_assert %10, "dim_2 == 256"
 %11 = iree_codegen.knob "wg_0" : !smt.int
 %12 = iree_codegen.knob "wg_1" : !smt.int
 %13 = iree_codegen.knob "red_2" : !smt.int
 %14 = iree_codegen.knob "sg_0_tcnt" : !smt.int
 %15 = iree_codegen.knob "sg_1_tcnt" : !smt.int
 %16 = iree_codegen.knob "sg_m_cnt" : !smt.int
 %17 = iree_codegen.knob "sg_n_cnt" : !smt.int
 %18 = iree_codegen.knob "mma_m" : !smt.int
 %19 = iree_codegen.knob "mma_n" : !smt.int
 %20 = iree_codegen.knob "mma_k" : !smt.int
 %c64 = smt.int.constant 64
 %c1024 = smt.int.constant 1024
 %c65536 = smt.int.constant 65536
 %c0_20 = smt.int.constant 0
 %21 = smt.int.mod %arg0, %11
 %22 = smt.eq %21, %c0_20 : !smt.int
 iree_codegen.smt_assert %22, "dim_0 must be divisible by wg_0"
 %c0_21 = smt.int.constant 0
 %23 = smt.int.mod %arg1, %12
 %24 = smt.eq %23, %c0_21 : !smt.int
 iree_codegen.smt_assert %24, "dim_1 must be divisible by wg_1"
 %c0_22 = smt.int.constant 0
 %25 = smt.int.mod %arg2, %13
 %26 = smt.eq %25, %c0_22 : !smt.int
 iree_codegen.smt_assert %26, "dim_2 must be divisible by red_2"
 %27 = smt.int.cmp ge %11, %18
 iree_codegen.smt_assert %27, "wg_0 >= mma_m"
 %c512_23 = smt.int.constant 512
 %28 = smt.int.cmp le %11, %c512_23
 iree_codegen.smt_assert %28, "wg_0 <= 512"
 %29 = smt.int.cmp le %11, %arg0
 iree_codegen.smt_assert %29, "wg_0 <= dim_0"
 %30 = smt.int.cmp ge %12, %19
 iree_codegen.smt_assert %30, "wg_1 >= mma_n"
 %c512_24 = smt.int.constant 512
 %31 = smt.int.cmp le %12, %c512_24
 iree_codegen.smt_assert %31, "wg_1 <= 512"
 %32 = smt.int.cmp le %12, %arg1
 iree_codegen.smt_assert %32, "wg_1 <= dim_1"
 %33 = smt.int.cmp ge %13, %20
 iree_codegen.smt_assert %33, "red_2 >= mma_k"
 %c512_25 = smt.int.constant 512
 %34 = smt.int.cmp le %13, %c512_25
 iree_codegen.smt_assert %34, "red_2 <= 512"
 %35 = smt.int.cmp le %13, %arg2
 iree_codegen.smt_assert %35, "red_2 <= dim_2"
 %c0_26 = smt.int.constant 0
 %36 = smt.int.mod %11, %18
 %37 = smt.eq %36, %c0_26 : !smt.int
 iree_codegen.smt_assert %37, "wg_0 must be a multiple of mma_m"
 %c0_27 = smt.int.constant 0
 %38 = smt.int.mod %12, %19
 %39 = smt.eq %38, %c0_27 : !smt.int
 iree_codegen.smt_assert %39, "wg_1 must be a multiple of mma_n"
 %c0_28 = smt.int.constant 0
 %40 = smt.int.mod %13, %20
 %41 = smt.eq %40, %c0_28 : !smt.int
 iree_codegen.smt_assert %41, "red_2 must be a multiple of mma_k"
 %c1_29 = smt.int.constant 1
 %42 = smt.int.cmp ge %16, %c1_29
 iree_codegen.smt_assert %42, "sg_m_cnt >= 1"
 %c32 = smt.int.constant 32
 %43 = smt.int.cmp le %16, %c32
 iree_codegen.smt_assert %43, "sg_m_cnt <= 32"
 %c1_30 = smt.int.constant 1
 %44 = smt.int.cmp ge %17, %c1_30
 iree_codegen.smt_assert %44, "sg_n_cnt >= 1"
 %c32_31 = smt.int.constant 32
 %45 = smt.int.cmp le %17, %c32_31
 iree_codegen.smt_assert %45, "sg_n_cnt <= 32"
 %c1_32 = smt.int.constant 1
 %46 = smt.int.cmp ge %14, %c1_32
 iree_codegen.smt_assert %46, "sg_0_tcnt >= 1"
 %c32_33 = smt.int.constant 32
 %47 = smt.int.cmp le %14, %c32_33
 iree_codegen.smt_assert %47, "sg_0_tcnt <= 32"
 %c1_34 = smt.int.constant 1
 %48 = smt.int.cmp ge %15, %c1_34
 iree_codegen.smt_assert %48, "sg_1_tcnt >= 1"
 %c32_35 = smt.int.constant 32
 %49 = smt.int.cmp le %15, %c32_35
 iree_codegen.smt_assert %49, "sg_1_tcnt <= 32"
 %50 = smt.int.mul %16, %14, %18
 %51 = smt.eq %11, %50 : !smt.int
 iree_codegen.smt_assert %51, "wg_0 == sg_m_cnt * sg_m_tcnt * mma_m"
 %52 = smt.int.mul %17, %15, %19
 %53 = smt.eq %12, %52 : !smt.int
 iree_codegen.smt_assert %53, "wg_1 == sg_n_cnt * sg_n_tcnt * mma_n"
 %54 = iree_codegen.knob "wg_x" : !smt.int
 %55 = iree_codegen.knob "wg_y" : !smt.int
 %56 = iree_codegen.knob "wg_z" : !smt.int
 %57 = iree_codegen.knob "sg_size" : !smt.int
 %58 = smt.int.mul %16, %17, %57
 %59 = smt.eq %54, %58 : !smt.int
 iree_codegen.smt_assert %59, "wg_x == sg_m_cnt * sg_n_cnt * sg_size"
 %60 = smt.int.cmp le %58, %c1024
 iree_codegen.smt_assert %60, "total threads <= max_threads"
 %c1_36 = smt.int.constant 1
 %61 = smt.eq %55, %c1_36 : !smt.int
 iree_codegen.smt_assert %61, "wg_y == 1"
 %62 = smt.eq %56, %c1_36 : !smt.int
 iree_codegen.smt_assert %62, "wg_z == 1"
 %63 = smt.eq %57, %c64 : !smt.int
 iree_codegen.smt_assert %63, "sg_size == preferred_subgroup_size"
 %c4 = smt.int.constant 4
 %64 = smt.int.mul %c4, %11, %13
 %c4_37 = smt.int.constant 4
 %65 = smt.int.mul %c4_37, %12, %13
 %66 = smt.int.add %64, %65
 %67 = smt.int.cmp le %66, %c65536
 iree_codegen.smt_assert %67, "shared memory must fit in workgroup memory"
 %c16 = smt.int.constant 16
 %68 = smt.eq %18, %c16 : !smt.int
 %c16_38 = smt.int.constant 16
 %69 = smt.eq %19, %c16_38 : !smt.int
 %c4_39 = smt.int.constant 4
 %70 = smt.eq %20, %c4_39 : !smt.int
 %71 = smt.and %68, %69, %70
 iree_codegen.smt_assert %71, "MMA shape must be a supported intrinsic"
}

Knob materialization

It's trivial to materialize most knobs, as most leaf values are integers wrapped with ArrayAttr / DictionaryAttr. What's more challenging is handling something like iree_gpu.mma_layout. In my prototype, lowering config materialization is mostly hardcoded: https://github.com/kuhar/iree/blob/8bf5b267287d3288435532aecdb5abb422879255/compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/GPULoweringConfigUtils.cpp#L239.

materializeCompilationInfo(constraintsOp, assignment, gpuTarget, elemTypes):

Walk knobs dict, resolve IntKnobAttr → value from assignment, keep IntegerAttr
Reverse-lookup MMA: findMMAAttrByShape(target, m, n, k, types)
Build LoweringConfigAttr (workgroup, reduction, subgroup, thread, mma_kind, basis)
Build TranslationInfoAttr (pipeline, workgroup_size, subgroup_size)
Return CompilationInfoAttr

In the proper implementation, we may need some interface so that pipelines can register known compilation_info keys as interfaces that materialize them from knob attributes + concrete assignments.

New bindings

C API (`iree_codegen.h`)

Function	Purpose
`ireeCodegenGetConstraintsOps(module, ...)`	Collect all constraints ops
`ireeCodegenConstraintsOpToSMTLIB(op)`	Export constraints → SMT-LIB string
`ireeCodegenMaterializeCompilationInfo(op, names, values, n)`	Solve → config
`ireeCodegenIntKnobAttrGet{TypeID,Name}`	IntKnobAttr accessors
`ireeCodegenMMAKnobAttrGet{TypeID,MName,NName,KName}`	MMAKnobAttr accessors

Python bindings (`IREECompilerDialectsModule.cpp`)

iree_codegen.get_constraints_ops(module)              # → [Operation]
iree_codegen.constraints_op_to_smt_lib(op)            # → StringAttr (SMT-LIB)
iree_codegen.materialize_compilation_info(op, {k: v}) # → CompilationInfoAttr

kuhar · 2026-02-19T20:20:59Z

kuhar
Feb 19, 2026
Maintainer Author

In a discussion via a video call:

@qedawkins suggested that we may want to configure ops that have no results, so capturing through SSA values is not ideal. What we really want to configure is regions, but that's not possible today. As an alternative, we could have iree_codegen.constraints have a second region argument with the configured ops, or move constraints to a separate function and then use that symbol in the attribute name, so that root ops point to their constraints, instead of the other way round.
@Max191 suggested we extend the tuner root_op attributes and not capture SSA values. This should be fine because the configuration pipeline is not expected to drop discardable attributes or modify the IR otherwise.

I think the idea with attributes is more practical. If we have to, we can move iree_codegen.constraints to a separate function later on. The only modification I suggest over 2. is to extend the root_op attribute with a set number, so that we can configure sets of root ops with different lowering configs, e.g.: #iree_codegen.root_op<set = 1>.

0 replies

kuhar · 2026-02-20T21:51:08Z

kuhar
Feb 20, 2026
Maintainer Author

Tracking issue: #23535

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Compiler-side tuner constraint generation #23521

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[RFC] Compiler-side tuner constraint generation #23521

Uh oh!

Uh oh!

kuhar Feb 19, 2026 Maintainer

Background

High-level Proposal

Implementation

iree_codegen.constraints

Knob materialization

New bindings

C API (iree_codegen.h)

Python bindings (IREECompilerDialectsModule.cpp)

Replies: 2 comments

Uh oh!

kuhar Feb 19, 2026 Maintainer Author

Uh oh!

kuhar Feb 20, 2026 Maintainer Author

kuhar
Feb 19, 2026
Maintainer

`iree_codegen.constraints`

C API (`iree_codegen.h`)

Python bindings (`IREECompilerDialectsModule.cpp`)

kuhar
Feb 19, 2026
Maintainer Author

kuhar
Feb 20, 2026
Maintainer Author