Skip to content

Commit 973ff43

Browse files
authored
XeTile GEMM test generator (#721)
1 parent a1f7591 commit 973ff43

File tree

8 files changed

+1346
-0
lines changed

8 files changed

+1346
-0
lines changed

scripts/xetile-test-gen/ReadMe.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# XeTile IMEX-4 test generator and perf reporter
2+
This directory contains test generating infrastructure that measures performance for a set of test cases and reports in excel format.
3+
One needs a csv file with test cases in this format:
4+
```
5+
BatchSize,M,K,N,dtype,wgm,wgn,sgm,sgn,sgk
6+
1,128,1000,2000,bf16,64,64,8,16,64
7+
1,128,1001,2048,bf16,64,64,8,16,64
8+
1,128,32768,256,bf16,64,64,8,16,64
9+
1,16,1024,1024,bf16,64,64,8,16,64
10+
```
11+
_Note: wg* and sg* here mean work group and subgroup tile sizes, sgk means step size in the k-loop
12+
Matrix A is (MxK), B is (KxN), C is (MxN)._
13+
14+
To get reports for different code versions, run `run_tests.sh`, you need to specify 4 parameters:
15+
1. `--test_csv` - path to the `.csv` file with testcases in the format mentioned above.
16+
2. `--validate` - 0 or 1, default is 0. Validation tests won't be profiled.
17+
3. `--gen_default_cases` - 0 or 1, default is 0. Additionally generates hardcoded test cases for a quick check (e.g., 4kx4k, batched 1kx1k).
18+
4. `--report_dir` - path to the directory where test reports in text format will be stored, default is `GEMM_reports`, directory is created if it doesn't exist.
19+
5. `--verbose`: 0 or 1, default is 0. If set to 1, outputs a command for each test that was used to run it.
20+
6. `--llvm_build_dir`: path to the LLVM directory where imex was built.
21+
Example usage:
22+
Generate tests from csv:
23+
```
24+
./run_tests.sh --test_csv=input_shapes.csv --validate=1 --report_dir=GEMM_reports --llvm_build_dir=../../../llvm-project/build
25+
```
26+
Generate only hardcoded tests (e.g., 4kx4k, 1kx1k) for a quick check:
27+
```
28+
./run_tests.sh --gen_default_cases=1 --validate=0 --verbose=1 --llvm_build_dir=../../../llvm-project/build
29+
```
30+
31+
It executes the following workflow for various options:
32+
1. `gen_xetile_from_shapes.py` reads the tests csv file, for each test case generates a corresponding `.mlir` file.
33+
2. All generated files are executed in the profiling mode and the reported time measurements (and errors) are aggregated in a text file.
34+
3. `report_to_excel.py` parses the text file and fills the excel spreadsheet, it will also contain additional columns with formulas for TFLOPS and Speedup (colored).
35+
36+
Currently, it generates GEMM code for the baseline implementation and the prefetch version.
37+
38+
39+
---------------
40+
`xetile_testgen.py` has 4 parameters that should be specified:
41+
1. `--code_version` - string, currently test generator supports `baseline` and `prefetch`.
42+
2. `--validate` - 0 or 1, controls CPU validation. When set to 0, no CPU validation will be performed and env variables `IMEX_ENABLE_PROFILING` will be set to report kernel time.
43+
3. `--print_debug` - 0 or 1, prints some debug info regarding tile shapes and layouts.
44+
4. `--test_csv` - path to the `.csv` file with testcases in the format mentioned above.
45+
`--prefetch` can be true or false: whether to generate code with prefetch or not
46+
-------------------
47+
48+
`report_to_excel.py` has 3 parameters:
49+
1. `--reports_dir` - path to the directory with report text files, MUST contain baseline.txt if creating spreadsheet from scratch.
50+
2. `--report_name` - path to the report that will be used to update the spreadsheet from `--sheet_name` by appending 3 columns to the right: `time,TFLOPS,speedup`.
51+
3. `--sheet_name` - path to the spreadsheet to be updated (if `--report_name`` is specified), otherwise will be used as the name of the spreadsheet to build.
52+
53+
Example build from scratch based on reports from directory:
54+
```
55+
python3 report_to_excel.py --reports_dir=../mydir
56+
```
57+
Example update existing:
58+
```
59+
python3 report_to_excel.py --reports_name=../new_code_variant.txt --sheet_name=../existing_sheet.xlsx
60+
```
Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
BatchSize,M,K,N,dtype,wgm,wgn,sgm,sgn,sgk
2+
512,384,384,64,bf16,256,64,32,32,32
3+
1,128,768,3072,bf16,32,32,16,16,32
4+
1,128,32768,256,bf16,64,64,8,16,64
5+
1,256,32768,512,bf16,64,64,8,16,64
6+
1,16384,50304,3072,bf16,256,256,32,64,32
7+
1,16384,3072,50304,bf16,512,128,32,64,32
8+
16,64,15000,64,bf16,32,32,8,32,32
9+
1,128,3072,768,bf16,32,32,8,32,32
10+
1,1000,256,2048,bf16,64,256,16,64,32
11+
1,6272,256,768,bf16,192,256,24,64,32
12+
1,49,3072,768,bf16,32,32,8,32,32
13+
1,197,3072,768,bf16,64,64,8,32,32
14+
1,196,2048,512,bf16,64,64,8,16,64
15+
1,256,1000,2048,bf16,128,128,32,32,32
16+
1,49,4096,1024,bf16,64,64,8,16,64
17+
1,512,32768,1024,bf16,128,128,16,32,32
18+
1,749,2048,512,bf16,128,128,16,32,32
19+
1,128,768,768,bf16,32,32,16,16,32
20+
1,49,2048,1024,bf16,64,64,8,16,64
21+
1,392,4096,1024,bf16,128,128,16,32,32
22+
16,64,1500,64,bf16,64,64,8,32,32
23+
1,392,2048,1024,bf16,64,64,8,32,32
24+
128,384,384,32,bf16,256,64,32,32,32
25+
1,54,11456,32,bf16,32,32,8,32,32
26+
1,54,11456,16,bf16,32,32,8,32,32
27+
1,840,2048,512,bf16,128,128,16,32,32
28+
1,1,32768,256,bf16,64,64,8,16,64
29+
1,32,1856,128,bf16,64,64,8,16,64
30+
1,128,2048,10,bf16,32,32,8,32,32
31+
1,1,2048,1000,bf16,32,32,8,32,32
32+
1,16,1024,2,bf16,32,32,8,32,32
33+
1,1,16,2,bf16,16,32,8,16,16
34+
1,1,768,2,bf16,32,32,8,32,32
35+
1,5,512,2,bf16,32,32,16,16,32
36+
1,1,512,72,bf16,32,32,8,32,32
37+
1,16,256,4,bf16,32,32,8,32,32
38+
1,16,2,1024,bf16,32,32,8,32,32
39+
1,2,16,1024,bf16,16,32,8,16,16
40+
1,16,16,256,bf16,32,32,8,32,32
41+
1,5000,8,2,bf16,256,64,16,64,16
42+
384,128,128,64,bf16,128,128,32,32,32
43+
1,1,1024,1000,bf16,32,32,8,32,32
44+
1,1,768,768,bf16,32,32,8,32,32
45+
1,179,54,16,bf16,32,32,8,32,32
46+
1,5000,4,8,bf16,32,32,8,32,32
47+
1,128,2,768,bf16,16,32,8,16,16
48+
1,49,16,256,bf16,32,32,8,32,32
49+
1,1,768,1000,bf16,32,32,8,32,32
50+
1,179,96,16,bf16,32,32,16,16,32
51+
16,4,4,1500,bf16,64,256,16,64,32
52+
1,14,128,200,bf16,32,32,8,32,32
53+
1,179,54,32,bf16,32,32,8,32,32
54+
1,128,14,200,bf16,16,32,8,16,16
55+
1,32,128,128,bf16,32,32,8,32,32
56+
1,179,96,32,bf16,32,32,8,32,32
57+
1,128,32,128,bf16,32,32,8,32,32
58+
1,16,4096,256,bf16,64,64,8,16,64
59+
1,4,1024,1024,bf16,64,64,8,16,64
60+
1,49,96,192,bf16,32,32,8,32,32
61+
1,8,1024,1000,bf16,32,32,8,32,32
62+
1,16,512,256,bf16,32,32,8,32,32
63+
1,128,128,64,bf16,32,32,8,32,32
64+
1,16,1024,512,bf16,64,64,8,16,64
65+
1,16,256,512,bf16,32,32,8,32,32
66+
1,392,1024,1024,bf16,128,128,32,32,32
67+
1,256,16,512,bf16,64,64,8,32,32
68+
1,10,128,2048,bf16,64,64,8,32,32
69+
1,72,512,144,bf16,32,32,16,16,32
70+
1,16,512,1024,bf16,32,32,16,16,32
71+
1,128,10,2048,bf16,64,64,8,32,32
72+
1,49,192,384,bf16,32,32,8,32,32
73+
1,128,256,128,bf16,32,32,8,32,32
74+
1,16,1024,1024,bf16,64,64,8,16,64
75+
1,128,200,200,bf16,32,32,8,32,32
76+
1,512,144,72,bf16,32,32,8,32,32
77+
1,24576,2,1024,bf16,256,256,32,64,16
78+
1,200,128,200,bf16,32,32,8,32,32
79+
1,512,72,144,bf16,32,32,8,32,32
80+
1,32,768,768,bf16,32,32,8,32,32
81+
1,24576,1024,2,bf16,256,64,32,32,32
82+
1,32,128,1856,bf16,64,64,8,32,32
83+
1,49,768,1536,bf16,32,32,8,32,32
84+
1,49,384,768,bf16,32,32,8,32,32
85+
1,512,16,1024,bf16,256,64,16,64,16
86+
384,24,24,64,bf16,32,32,8,32,32
87+
1,1856,32,128,bf16,32,32,8,32,32
88+
384,24,64,24,bf16,32,32,16,16,32
89+
1,196,1536,384,bf16,64,64,8,32,32
90+
1,49,768,768,bf16,32,32,8,32,32
91+
1,49,1024,1024,bf16,64,64,8,16,64
92+
1,3072,16384,12288,bf16,256,256,32,64,16
93+
1,11456,16,54,bf16,256,64,16,64,16
94+
1,11456,54,16,bf16,256,64,32,32,32
95+
1024,384,64,384,bf16,96,256,32,64,32
96+
256,2048,96,2048,bf16,128,128,32,32,32
97+
1,16,256,4096,bf16,32,32,8,32,32
98+
1,16000,12544,1024,bf16,256,256,32,64,32
99+
12,128,128,64,bf16,32,32,8,32,32
100+
1,11456,96,16,bf16,256,64,32,32,32
101+
1,4096,16,256,bf16,128,128,16,32,32
102+
1,1024,2048,364,bf16,128,128,32,32,32
103+
1,1024,16,1024,bf16,256,64,16,64,16
104+
1,512,768,92,bf16,32,32,16,16,32
105+
64,64,512,512,bf16,64,256,16,64,32
106+
1,1000,1024,364,bf16,128,128,32,32,32
107+
16,64,64,1500,bf16,96,256,32,64,32
108+
1,11456,32,54,bf16,256,64,32,32,32
109+
1,11456,16,96,bf16,256,64,16,64,16
110+
80,83,84,64,bf16,32,32,8,32,32
111+
80,84,84,64,bf16,32,32,8,32,32
112+
1,196,1024,512,bf16,64,64,8,32,32
113+
1,196,384,384,bf16,32,32,8,32,32
114+
1,287,512,256,bf16,32,32,8,32,32
115+
1,2048,364,1024,bf16,256,128,32,32,32
116+
1,32768,256,512,bf16,256,256,32,64,16
117+
1,11456,54,32,bf16,32,32,8,32,32
118+
1,3136,512,128,bf16,64,64,8,32,32
119+
384,128,64,128,bf16,128,128,32,32,32
120+
80,84,64,84,bf16,128,128,32,32,32
121+
1,768,512,92,bf16,32,32,16,16,32
122+
1,11456,96,32,bf16,32,32,8,32,32
123+
256,2048,2048,96,bf16,128,128,16,32,32
124+
1024,384,384,64,bf16,64,64,8,16,64
125+
1,49,1024,3072,bf16,64,64,8,32,32
126+
1,830,512,512,bf16,64,64,8,32,32
127+
1,3136,96,96,bf16,64,64,8,32,32
128+
1,128,1000,2000,bf16,64,64,8,16,64
129+
1,6272,256,1024,bf16,256,256,32,64,16
130+
1,784,192,192,bf16,64,64,8,32,32
131+
1,196,512,2048,bf16,64,64,8,32,32
132+
1,196,512,512,bf16,64,64,8,32,32
133+
1,49,768,2304,bf16,64,64,8,32,32
134+
1,512,92,768,bf16,64,64,8,32,32
135+
1,49,1024,4096,bf16,64,64,8,16,64
136+
1,32768,512,256,bf16,256,256,32,64,16
137+
1,3136,96,288,bf16,256,128,32,32,32
138+
1,3136,128,128,bf16,64,64,8,32,32
139+
1,512,768,256,bf16,64,64,8,16,64
140+
1,784,1024,256,bf16,64,64,8,16,64
141+
1,197,768,2304,bf16,128,128,32,32,32
142+
1,840,512,2048,bf16,256,128,32,32,32
143+
1,11456,32,96,bf16,256,128,32,32,32
144+
1,49,768,3072,bf16,64,64,8,32,32
145+
1,5000,512,256,bf16,96,256,32,32,32
146+
1,6272,256,256,bf16,256,128,32,32,32
147+
384,384,384,64,bf16,256,64,32,32,32
148+
1,196,384,1152,bf16,128,128,32,32,32
149+
1,3136,96,384,bf16,256,128,32,32,32
150+
1,3136,128,384,bf16,256,128,32,32,32
151+
1,784,256,256,bf16,64,64,8,32,32
152+
256,96,2048,2048,bf16,128,128,16,32,32
153+
1,32768,256,128,bf16,512,128,32,64,32
154+
1,784,768,192,bf16,64,64,8,32,32
155+
1,840,512,512,bf16,64,64,8,32,32
156+
1,197,768,768,bf16,64,64,8,32,32
157+
1,3136,128,512,bf16,256,128,32,32,32
158+
1,32768,128,256,bf16,256,256,32,64,32
159+
1,784,192,768,bf16,128,128,16,32,32
160+
1,128,2000,1000,bf16,64,64,8,32,32
161+
1,128,1792,1000,bf16,64,64,8,32,32
162+
1,768,512,256,bf16,64,64,8,16,64
163+
1,25088,128,128,bf16,512,128,32,64,32
164+
256,64,512,512,bf16,64,256,16,64,32
165+
256,512,64,512,bf16,256,256,32,64,32
166+
1,128,1000,1000,bf16,64,64,8,32,32
167+
1,256,2048,1000,bf16,64,64,8,16,64
168+
1,768,3072,768,bf16,128,128,16,32,32
169+
1,197,768,3072,bf16,128,128,32,32,32
170+
1,749,512,2048,bf16,256,128,32,32,32
171+
1,942,128,1000,bf16,128,128,16,32,32
172+
16,64,64,15000,bf16,96,256,32,64,32
173+
1,6272,512,256,bf16,256,128,32,32,32
174+
1,768,768,768,bf16,128,128,16,32,32
175+
1,128,5270,1000,bf16,64,64,8,32,32
176+
64,512,64,512,bf16,256,256,32,64,16
177+
1,512,830,512,bf16,64,64,8,16,64
178+
1,128,942,1000,bf16,64,64,8,32,32
179+
1,784,256,1024,bf16,128,128,16,32,32
180+
1,32768,512,1024,bf16,256,256,32,64,32
181+
1,3136,384,96,bf16,64,64,8,32,32
182+
1,4096,768,3072,bf16,256,256,32,64,32
183+
1,128,1280,1000,bf16,64,64,8,32,32
184+
1,128,1536,1000,bf16,64,64,8,32,32
185+
1,784,192,576,bf16,128,128,32,32,32
186+
1,5270,128,1000,bf16,192,256,24,64,32
187+
1,196,384,1536,bf16,64,64,8,32,32
188+
1,512,256,768,bf16,64,64,8,32,32
189+
1,1000,12544,1024,bf16,128,128,16,32,32
190+
1,768,768,3072,bf16,192,256,24,64,32
191+
1,512,840,512,bf16,64,64,8,16,64
192+
1,1568,1024,512,bf16,128,128,32,32,32
193+
1,196,512,1536,bf16,64,64,8,32,32
194+
1,4096,768,768,bf16,256,256,32,64,32
195+
1,1568,512,512,bf16,128,128,16,32,32
196+
1,25088,128,384,bf16,128,128,32,32,32
197+
12,1024,1024,64,bf16,256,64,32,32,32
198+
1,784,512,256,bf16,64,64,8,32,32
199+
1,25088,128,512,bf16,96,256,24,64,32
200+
64,512,512,64,bf16,256,64,32,32,32
201+
256,512,512,64,bf16,256,64,32,32,32
202+
1,1568,512,1536,bf16,192,256,24,64,32
203+
1,392,1024,3072,bf16,96,256,32,32,32
204+
1,1568,2048,512,bf16,128,128,16,32,32
205+
1,1000,1024,1024,bf16,128,128,16,32,32
206+
1,6272,1024,256,bf16,256,128,32,32,32
207+
1,128,1001,2048,bf16,64,64,8,16,64
208+
1,784,256,768,bf16,128,128,16,32,32
209+
1,2048,30522,1024,bf16,192,256,24,64,32
210+
1,12544,2048,1024,bf16,256,256,32,64,16
211+
1,5000,1024,512,bf16,192,256,24,64,32
212+
1,2048,12544,1024,bf16,256,128,32,32,32
213+
1,256,1001,2048,bf16,128,128,32,32,32
214+
1,2048,1024,1024,bf16,256,128,32,32,32
215+
1,2048,1024,364,bf16,128,128,32,32,32
216+
1,2048,1024,12544,bf16,256,256,32,64,32
217+
1,392,1024,4096,bf16,256,128,32,32,32
218+
1,1024,32768,1024,bf16,128,128,16,32,32
219+
1,1024,2048,1024,bf16,128,128,16,32,32
220+
1,1568,512,2048,bf16,256,256,32,64,32
221+
1,2048,840,512,bf16,128,128,16,32,32
222+
1,512,840,2048,bf16,128,128,16,32,32
223+
1,2048,1024,4096,bf16,256,256,32,64,16
224+
1,4096,3072,768,bf16,256,256,32,64,32
225+
1,25088,512,128,bf16,512,128,32,64,32
226+
1,1024,8192,1024,bf16,128,128,16,32,32
227+
1,3072,16384,3072,bf16,192,256,24,64,32
228+
1,16384,12288,3072,bf16,256,256,32,64,32
229+
1,30522,8192,1024,bf16,256,256,32,64,32
230+
1,50304,16384,3072,bf16,256,256,32,64,32
231+
1,12288,16384,3072,bf16,256,256,32,64,32
232+
1,2048,749,512,bf16,128,128,16,32,32
233+
1,512,749,2048,bf16,128,128,16,32,32
234+
1,9216,16384,3072,bf16,256,256,32,64,32
235+
1,1024,2048,1000,bf16,128,128,16,32,32
236+
1,32768,1024,512,bf16,256,256,32,64,16
237+
1,8192,1024,1024,bf16,256,256,32,64,16
238+
1,16384,3072,12288,bf16,256,256,32,64,32
239+
1,30522,2048,1024,bf16,256,256,32,64,32
240+
1,16384,9216,3072,bf16,256,256,32,64,32
241+
1,24576,1024,1024,bf16,256,256,32,64,32
242+
1,8192,1024,4096,bf16,256,256,32,64,32
243+
1,32768,1024,1024,bf16,256,256,32,64,32
244+
1,24576,4096,1024,bf16,256,256,32,64,32
245+
1,16000,1024,1024,bf16,256,256,32,64,16
246+
1,16384,3072,9216,bf16,256,256,32,64,32
247+
1,16384,3072,3072,bf16,256,256,32,64,32
248+
1,2048,4096,1024,bf16,256,128,32,32,32
249+
1,8192,30522,1024,bf16,256,256,32,64,32
250+
1,24576,1024,4096,bf16,256,256,32,64,32
251+
1,8192,4096,1024,bf16,256,256,32,64,32
252+
1,1024,2048,4096,bf16,256,256,32,64,32
253+
1,4096,2048,1024,bf16,256,256,32,64,32
254+
1,5000,845,1024,bf16,192,256,24,64,32
255+
1,4096,8192,1024,bf16,256,256,32,64,32
256+
1,1024,8192,4096,bf16,256,256,32,64,32
257+
1,16000,1024,364,bf16,256,128,32,32,32

0 commit comments

Comments
 (0)