You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"summary": "Swap the Value of the invocation within the quad with another invocation\n in the quad using Direction.",
49951
+
"description": "Result Type must be a scalar or vector of floating-point type, integer type,\n or Boolean type.\n\n Execution is a Scope, but has no effect on the behavior of this instruction.\n It must be Subgroup.\n\n The type of Value must be the same as Result Type.\n\n Direction is the kind of swap to perform.\n\n Direction must be a scalar of integer type, whose Signedness operand is 0.\n\n Direction must come from a constant instruction.\n\n The value returned in Result is the value provided to Value by another invocation\n in the same quad scope instance. The invocation providing this value is\n determined according to Direction.\n\n A Direction of 0 indicates a horizontal swap;\n - Invocations with quad indices of 0 and 1 swap values\n - Invocations with quad indices of 2 and 3 swap values\n A Direction of 1 indicates a vertical swap;\n - Invocations with quad indices of 0 and 2 swap values\n - Invocations with quad indices of 1 and 3 swap values\n A Direction of 2 indicates a diagonal swap;\n - Invocations with quad indices of 0 and 3 swap values\n - Invocations with quad indices of 1 and 2 swap values\n\n Direction must be one of the above values.\n\n If a tangled invocation within the quad reads Value from an invocation not part\n of the tangled invocation within the same quad, the resulting value is undefined.\n\n An invocation will not execute a dynamic instance of this instruction (X') until\n all invocations in its quad have executed all dynamic instances that are program-ordered\n before X'.\n\n #### Example:\n\n ```mlir\n %0 = spirv.GroupNonUniformQuadSwap <Subgroup> %value %dir : f32, i32\n %1 = spirv.GroupNonUniformQuadSwap <Subgroup> %value %dir : vector<4xf32>, i32\n ```",
"summary": "Rotate values across invocations within a subgroup.",
@@ -107994,13 +108015,17 @@
107994
108015
{
107995
108016
"name": "ttg.global_scratch_alloc",
107996
108017
"summary": "allocate a global memory buffer",
107997
-
"description": "This operation allocates a buffer in global memory that is private to the current program.",
108018
+
"description": "This operation allocates a buffer in global memory that is private to the current program.\n The `backend` attribute specifies the backend to use for allocation.\n The `default` backend is used by TritonGPU passes.\n Downstream Triton tools and compilers can register a different backend and use a different allocation policy.",
"summary": "Gather elements from shared memory along a specified axis",
108065
+
"description": "Gather elements from a shared memory descriptor using an indices tensor along a\n single specified axis. The output tensor has the same shape as the indices tensor.\n\n For each output position I, the operation reads from src where the coordinate at\n the gather axis is replaced by indices[I]:\n result[I] = src[I[0], ..., indices[I], ..., I[n]]\n where the axis dimension is replaced by the index value.\n\n This matches the behavior of tt.gather but operates on shared memory descriptors.",
"summary": "Scatter elements to shared memory along a specified axis",
108095
+
"description": "Scatter elements to a shared memory descriptor using an indices tensor along a\n single specified axis. The values tensor has the same shape as the indices tensor.\n\n For each input position I, the operation writes to dst where the coordinate at\n the scatter axis is replaced by indices[I]:\n dst[I[0], ..., indices[I], ..., I[n]] = values[I]\n where the axis dimension is replaced by the index value.\n\n This is the inverse of local_gather and writes to shared memory at runtime-computed indices.",
"summary": "Store a distributed tensor into a buffer in local memory",
@@ -108187,9 +108244,6 @@
108187
108244
"name": "ttg.warp_specialize",
108188
108245
"summary": "asynchronously execute code on multiple warpgroups",
108189
108246
"description": "The `ttg.warp_specialize` op represents executing different code\n simultaneously on different warp groups. A warp group is a group of\n power-of-2 warps, which can be a different number of warps than in the\n enclosing region.\n\n The \"default\" region of the op represents the code executed by the currently\n executing warp group. This region is allowed to implicitly capture. The op\n contains a number of \"partition\" regions that are isolated from above. They\n must be isolated because these regions represent different layout domains,\n as the number of warps is different.\n\n Semantically, execution of each region starts simultaneously for each warp\n group, and all warp groups are joined at the end of the op.\n\n Example:\n\n ```mlir\n %0 = ttg.warp_specialize(%a, %b)\n default {\n %out = some_operation(%a) // implicit capture of `%a`\n ttg.warp_yield %out : i32\n }\n partition0(%arg0: i32, %arg1: i32) num_warps(8) {\n some_async_dispatch(%arg0, %arg1)\n ttg.warp_return\n }\n partition1(%arg0: i32, %arg1: i32) num_warps(1) {\n some_async_dispatch(%arg0, %arg1)\n ttg.warp_return\n } : (i32, i32) -> i32\n ```",
"summary": "container op for `ttg.warp_specialize`",
108211
108265
"description": "Because MLIR requires entire operations be isolated from above, this op\n contains the actual isolated from above regions of `ttg.warp_specialize`.",
0 commit comments