Skip to content

Commit c18b0ea

Browse files
akuegelGoogle-ML-Automation
authored andcommitted
Update code links in documentation.
PiperOrigin-RevId: 715330503
1 parent a32e667 commit c18b0ea

File tree

1 file changed

+16
-15
lines changed

1 file changed

+16
-15
lines changed

docs/emitters.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ The code consists of the following big building blocks:
3838

3939
## Partitioning
4040

41-
See [computation_partitioner.h](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/mlir/computation_partitioner.h).
41+
See [computation_partitioner.h](https://github.com/openxla/xla/blob/ca62f3e1bc9ea1d808c3a4de0a78bae7453389eb/xla/codegen/emitters/computation_partitioner.h).
4242

4343
Non-elementwise HLO instructions cannot always be emitted together. Consider the
4444
following HLO graph:
@@ -79,7 +79,7 @@ The same is applicable to the following example with `slice` and `pad` of `add`.
7979

8080
## Elemental emission
8181

82-
See [elemental_hlo_to_mlir.h](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/mlir/elemental_hlo_to_mlir.h).
82+
See [elemental_hlo_to_mlir.h](https://github.com/openxla/xla/blob/ca62f3e1bc9ea1d808c3a4de0a78bae7453389eb/xla/codegen/emitters/elemental_hlo_to_mlir.h).
8383

8484
Elemental emission creates loops and math/arith ops for `HloInstructions`. For
8585
the most part, this is straightforward, but there are some interesting things
@@ -163,7 +163,7 @@ No other uses of the output tensors are allowed.
163163

164164
### Loop emitter
165165

166-
See [loop_mlir.h](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/loop_mlir.h#L4).
166+
See [loop.h](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/emitters/loop.h#L4).
167167

168168
Let's study the most important passes of the MLIR compilation pipeline using the
169169
HLO for the GELU function.
@@ -234,7 +234,7 @@ func.func private @gelu(%arg0: tensor<6x512x4096xbf16>, %i: index, %j: index, %k
234234
After `@gelu` is inlined, we get a single `@main` function. It can happen that
235235
the same function is called twice or more. In this case we don't inline. More
236236
details on the inlining rules can be found in
237-
[xla_gpu_dialect.cc](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/ir/xla_gpu_dialect.cc).
237+
[xla_gpu_dialect.cc](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/ir/xla_gpu_dialect.cc).
238238

239239
```
240240
func.func @main(%arg0: tensor<6x512x4096xbf16>, %arg1: tensor<6x512x4096xbf16>) -> tensor<6x512x4096xbf16> {
@@ -263,7 +263,7 @@ func.func @main(%arg0: tensor<6x512x4096xbf16>, %arg1: tensor<6x512x4096xbf16>)
263263

264264
#### `xla_gpu` to `scf` conversion
265265

266-
See [lower_xla_gpu_to_scf.cc](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/transforms/lower_xla_gpu_to_scf.cc).
266+
See [lower_xla_gpu_to_scf.cc](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/transforms/lower_xla_gpu_to_scf.cc).
267267

268268
`xla_gpu.loop` represents a loop nest with a boundary check inside. If the loop
269269
inductions variables are out of bounds of the indexing map domain, then this
@@ -297,7 +297,7 @@ iteration is skipped. It means, that the loop is converted to 1 or more nested
297297

298298
#### Flatten tensors
299299

300-
See [flatten_tensors.cc](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/transforms/flatten_tensors.cc).
300+
See [flatten_tensors.cc](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/transforms/flatten_tensors.cc).
301301

302302
The N-d tensors are projected onto 1D. This will simplify the vectorization and
303303
the lowering to LLVM because every tensor access now corresponds to how the data
@@ -329,7 +329,7 @@ func.func @main(%input: tensor<12582912xbf16>, %output: tensor<12582912xbf16>) -
329329

330330
#### Vectorization
331331

332-
See [vectorize_loads_stores.cc](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/transforms/vectorize_loads_stores.cc).
332+
See [vectorize_loads_stores.cc](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/transforms/vectorize_loads_stores.cc).
333333

334334
The pass analyses the indices in the `tensor.extract` and `tensor.insert` ops
335335
and if they are produced by `xla_gpu.apply_indexing` that accesses the elements
@@ -370,7 +370,7 @@ func.func @main(%input: tensor<12582912xbf16>, %output: tensor<12582912xbf16>) -
370370

371371
#### Loop unrolling
372372

373-
See [optimize_loops.cc](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/transforms/optimize_loops.cc).
373+
See [optimize_loops.cc](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/transforms/optimize_loops.cc).
374374

375375
The loop unrolling finds `scf.for` loops that can be unrolled. In this case, the
376376
loop over the elements of the vector disappears.
@@ -405,9 +405,9 @@ We cannot use the `memref` lowerings for tensors, since we don't bufferize the
405405
IR and our ABI is not compatible with the `memref` ABI. Instead, we have a
406406
custom lowering directly from tensors to `LLVM`.
407407

408-
- The lowering of tensors is done in [lower_tensors.cc](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/transforms/lower_tensors.cc). `tensor.extract` is
408+
- The lowering of tensors is done in [lower_tensors.cc](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/transforms/lower_tensors.cc). `tensor.extract` is
409409
lowered to `llvm.load`, `tensor.insert` to `llvm.store`, in the obvious way.
410-
- [propagate_slice_indices](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/transforms/propagate_slice_indices.cc) and [merge_pointers_to_same_slice](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/transforms/merge_pointers_to_same_slice.cc) together
410+
- [propagate_slice_indices](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/transforms/propagate_slice_indices.cc) and [merge_pointers_to_same_slice](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/transforms/merge_pointers_to_same_slice.cc) together
411411
implement a detail of buffer assignment and XLA's ABI: if two tensors share
412412
the same buffer slice, they are only passed once. These passes deduplicate the
413413
function arguments.
@@ -490,6 +490,7 @@ coalesced writes to the output.
490490
### Reproducer
491491

492492
In order to see the IR after every pass of the compilation pipeline, one can launch `run_hlo_module` with the `--v=5` flag.
493+
493494
```
494495
run_hlo_module --platform=CUDA --xla_disable_all_hlo_passes --reference_platform="" --v=5 /tmp/gelu.hlo
495496
```
@@ -529,9 +530,9 @@ ENTRY main {
529530

530531
## Links to code
531532

532-
* Compilation pipeline: [mlir_fusion_emitter.h](https://github.com/openxla/xla/blob/61d921cb22abe672d4cf9fdf80b6a63a76ab7042/xla/service/gpu/fusions/mlir/mlir_fusion_emitter.h)
533-
* Optimization and conversion passes: [gpu/fusions/transforms](https://github.com/openxla/xla/tree/61d921cb22abe672d4cf9fdf80b6a63a76ab7042/xla/service/gpu/fusions/transforms)
534-
* Partition logic: [computation_partitioner.h](https://github.com/openxla/xla/blob/852d2d2e4abfc7459f50cc958edb68c82e5f9ffe/xla/service/gpu/fusions/mlir/computation_partitioner.h)
535-
* Hero-based emitters: [gpu/fusions](https://github.com/openxla/xla/tree/61d921cb22abe672d4cf9fdf80b6a63a76ab7042/xla/service/gpu/fusions)
536-
* XLA:GPU ops: [xla_gpu_ops.td](https://github.com/openxla/xla/blob/61d921cb22abe672d4cf9fdf80b6a63a76ab7042/xla/service/gpu/fusions/ir/xla_gpu_ops.td)
533+
* Compilation pipeline: [emitter_base.h](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/emitters/emitter_base.h)
534+
* Optimization and conversion passes: [backends/gpu/codegen/transforms](https://github.com/openxla/xla/tree/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/transforms)
535+
* Partition logic: [computation_partitioner.h](https://github.com/openxla/xla/blob/ca62f3e1bc9ea1d808c3a4de0a78bae7453389eb/xla/codegen/emitters/computation_partitioner.h)
536+
* Hero-based emitters: [backends/gpu/codegen/emitters](https://github.com/openxla/xla/tree/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/emitters)
537+
* XLA:GPU ops: [xla_gpu_ops.td](https://github.com/openxla/xla/blob/cfd16b7f21feff17635c782f4489c0f478178eb9/xla/backends/gpu/codegen/ir/xla_gpu_types.td)
537538
* Correctness and lit tests: [gpu/fusions/tests](https://github.com/openxla/xla/tree/925722533aa2ca55219f5c88c1ec333f4e1cbd7c/xla/service/gpu/fusions/tests)

0 commit comments

Comments
 (0)