Skip to content

Commit ed96d29

Browse files
efrickeshavvinayak01
authored andcommitted
[docs] Add documentation for updating golden outputs (iree-org#21641)
This PR documents the process for updating the golden outputs and verifying them with the accuracy script. --------- Signed-off-by: Eric Feng <[email protected]> Signed-off-by: keshavvinayak01 <[email protected]>
1 parent 40e274e commit ed96d29

File tree

3 files changed

+118
-0
lines changed

3 files changed

+118
-0
lines changed

docs/website/docs/developers/general/testing-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -538,6 +538,8 @@ Types of Sharktank tests:
538538
The quality and benchmark test config files are stored in
539539
[`tests/external/iree-test-suites/sharktank_models`](https://github.com/iree-org/iree/tree/main/tests/external/iree-test-suites/sharktank_models).
540540

541+
Detailed steps on how to update the golden output in SDXL may be found [here](../update-sdxl-golden-outputs.md).
542+
541543
<!-- TODO(scotttodd): document how to coordinate changes across these projects -->
542544

543545
### SHARK-TestSuite
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
icon: material/lightbulb-on
3+
---
4+
5+
# Updating SDXL Golden Outputs for IREE CI
6+
7+
Golden outputs are reference results generated from a known-good version of the
8+
SDXL pipeline. They serve as the “ground truth” for CI quality tests in IREE,
9+
ensuring that future changes do not silently alter accuracy. When a change is
10+
made which affects the numerics (e.g, modifying the order of floating-point
11+
operations), differences in outputs can occur. In such cases, you must
12+
regenerate the golden outputs so that CI reflects the new expected results. This
13+
page describes the end-to-end process: verifying accuracy, generating new
14+
outputs, uploading them to storage, bumping the version in configuration, and
15+
re-running CI.
16+
17+
## Verify accuracy before updating goldens
18+
19+
Before updating golden outputs, first confirm your change maintains acceptable
20+
accuracy. Follow the steps
21+
[outlined](https://github.com/nod-ai/SHARK-MLPERF/blob/dev/code/stable-diffusion-xl/development.md#test-accuracy-only).
22+
23+
A straightforward way to test your change is by editing
24+
`sdxl_harness_rocm_shortfin_from_source_iree.dockerfile` so that it builds your
25+
IREE and exposes the right tooling:
26+
27+
- Build your IREE commit and add the build’s tools to `PATH`.
28+
- Add your IREE Python bindings to `PYTHONPATH`.
29+
- Remove the prebuilt wheels for `iree-base-compiler` and `iree-base-runtime` so
30+
you’re testing your own build.
31+
32+
Run the accuracy script (`run_accuracy_mi325x.sh`) and be mindful of
33+
platform-specific settings. If you are running in SPX mode, update available
34+
device IDs accordingly. On MI300x, set `CPD=1` and use `BATCH_SIZE=32`. Accuracy
35+
is considered acceptable if FID and CLIP scores fall within the advertised
36+
ranges.
37+
38+
## Generate new outputs with your IREE build
39+
40+
Once accuracy is confirmed, generate new outputs using the same inputs that CI
41+
consumes. Both inputs and outputs live in the `sharkpublic` Azure container. If
42+
you do not already have the desired inputs, locate and download the input files
43+
for your model revision and place them in a local directory. You may find the
44+
exact paths in the relevant json file in
45+
`tests/external/iree-test-suites/sharktank_models/quality_tests/sdxl/`.
46+
47+
Next, compile the relevant model using your IREE build. The exact flags should
48+
mirror what CI uses for the target you're validating. You can find this
49+
information from failing CI logs or from the same json file as mentioned above.
50+
The example below shows a representative invocation; replace paths and flags
51+
with your local equivalents as needed.
52+
53+
```bash
54+
iree-build/tools/iree-compile \
55+
-o model.rocm_gfx942.vmfb \
56+
punet_fp16.mlir \
57+
--mlir-timing \
58+
--mlir-timing-display=list \
59+
--iree-consteval-jit-debug \
60+
--iree-hal-target-device=hip \
61+
--iree-opt-const-eval=false \
62+
--iree-opt-level=O3 \
63+
--iree-dispatch-creation-enable-fuse-horizontal-contractions=true \
64+
--iree-vm-target-truncate-unsupported-floats \
65+
--iree-llvmgpu-enable-prefetch=true \
66+
--iree-opt-data-tiling=false \
67+
--iree-codegen-gpu-native-math-precision=true \
68+
--iree-codegen-llvmgpu-use-vector-distribution \
69+
--iree-hip-waves-per-eu=2 \
70+
--iree-execution-model=async-external \
71+
--iree-scheduling-dump-statistics-format=json \
72+
--iree-scheduling-dump-statistics-file=compilation_info.json \
73+
--iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-flow-canonicalize), iree-preprocessing-transpose-convolution-pipeline, iree-preprocessing-pad-to-intrinsics)" \
74+
--iree-codegen-transform-dialect-library=/path/to/attention_and_matmul_spec_punet_mi300.mlir \
75+
--iree-hip-target=gfx942
76+
```
77+
78+
After compilation, run the module to produce the new outputs that will become
79+
the new goldens:
80+
81+
```bash
82+
iree-build/tools/iree-run-module \
83+
--device=hip \
84+
--module=model.rocm_gfx942.vmfb \
85+
--function=main \
86+
--input=1x4x128x128xf16=@${CACHE_DIR}/punet_input0.bin \
87+
--input=1xf16=@${CACHE_DIR}/punet_input1.bin \
88+
--input=2x64x2048xf16=@${CACHE_DIR}/punet_input2.bin \
89+
--input=2x1280xf16=@${CACHE_DIR}/punet_input3.bin \
90+
--input=2x6xf16=@${CACHE_DIR}/punet_input4.bin \
91+
--input=1xf16=@${CACHE_DIR}/punet_input5.bin \
92+
--parameters=model=/path/to/punet_weights.irpa \
93+
--output=@punet_fp16_out_v{n+1}.0.bin
94+
```
95+
96+
## Upload new outputs to Azure
97+
98+
With outputs generated, upload the new `v{n+1}` outputs to the same location in
99+
the `sharkpublic` Azure container as the previous outputs.
100+
101+
```bash
102+
az storage blob upload \
103+
--account-name sharkpublic \
104+
--container-name sharkpublic \
105+
--name <path/in/blob/container> \
106+
--file <local/file/path>
107+
```
108+
109+
After uploading, update the configuration that tells CI which golden version to
110+
use. This is typically a JSON key whose value encodes the version (for example,
111+
`punet_output_v{n}`). Increment it to `punet_output_v{n+1}` and commit this
112+
change along with any related edits.
113+
114+
Finally, re-run the CI pipeline and confirm the quality tests pass against the
115+
newly uploaded outputs.

docs/website/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,7 @@ nav:
260260
- "developers/design-docs/vm.md"
261261
- "Other topics":
262262
- "developers/usage-best-practices.md"
263+
- "developers/update-sdxl-golden-outputs.md"
263264
- "developers/vulkan-environment-setup.md"
264265
- "Community":
265266
- "community/index.md"

0 commit comments

Comments
 (0)