You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* The framework combines these with `BenchmarkMetadata` from benchmarks and suites. The metadata
79
+
is used for defining charts.
80
+
* All data is packaged into a `BenchmarkOutput` object containing runs, metadata, and tags for serialization.
81
+
82
+
2.**Serialization:**
83
+
* For local viewing (`--output-html local`): Data is written as JavaScript variables in `data.js`.
84
+
These are directly loaded in the HTML dashboard.
85
+
* For remote deployment (`--output-html remote`): Data is written as JSON in `data.json`.
86
+
The `config.js` file contains the URL where the json file is hosted.
87
+
* Historical runs may be separated into archive files for better dashboard load times.
88
+
89
+
3.**Dashboard Rendering:**
90
+
* JavaScript processes the data to create three chart types:
91
+
***Historical Results**: Time-series charts showing performance trends over multiple runs. One chart for each unique benchmark scenario.
92
+
***Historical Layer Comparisons**: Time-series charts for grouped results. Benchmark scenarios that can be directly compared are grouped either by using `explicit_group()` or matching the beginning of their labels with predefined groups.
93
+
***Comparisons**: Bar charts comparing selected runs side-by-side. Again, based on the `explicit_group()` or labels.
94
+
95
+
### Chart Types and Result Mapping
96
+
97
+
**Historical Results (Time-series):**
98
+
* One chart per unique `result.label`.
99
+
* X-axis: `BenchmarkRun.date` (time).
100
+
* Y-axis: `result.value` with `result.unit`.
101
+
* Multiple lines for different `BenchmarkRun.name` entries.
102
+
* Points include `result.stddev`, `result.git_hash`, and environment info in tooltips.
103
+
104
+
**Historical Layer Comparisons:**
105
+
* Groups related results using `benchmark.explicit_group()` or `result.label` prefixes.
106
+
* Useful for comparing different implementations/configurations of the same benchmark.
107
+
* Same time-series format but with grouped data.
108
+
109
+
**Comparisons (Bar charts):**
110
+
* Compares selected runs side-by-side.
111
+
* X-axis: `BenchmarkRun.name`.
112
+
* Y-axis: `result.value` with `result.unit`.
113
+
* One bar per selected run.
114
+
115
+
### Dashboard Features Controlled by Results/Metadata
116
+
117
+
**Visual Properties:**
118
+
***Chart Title**: `metadata.display_name` or `result.label`.
119
+
***Y-axis Range**: `metadata.range_min` and `range_max` (when custom ranges enabled).
120
+
***Direction Indicator**: `result.lower_is_better` (shows "Lower/Higher is better").
121
+
***Grouping**: `benchmark.explicit_group()` groups related results together.
122
+
123
+
**Filtering and Organization:**
124
+
***Suite Filters**: Filter by `result.suite`.
125
+
***Tag Filters**: Filter by `metadata.tags`.
126
+
***Regex Search**: Search by `result.label` patterns, `metadata.display_name` patterns are not searchable.
127
+
***Stability**: Hide/show based on `metadata.unstable`.
* All filters, selections, and options are preserved in URL parameters.
145
+
* Enables sharing specific dashboard views via URL address copy.
146
+
147
+
### Best Practices for Dashboard-Friendly Results
148
+
149
+
**Naming:**
150
+
* Use unique `result.label` names that will be most descriptive.
151
+
* Consider `metadata.display_name` for prettier chart titles.
152
+
* Ensure `benchmark.name()` is unique across all suites.
153
+
154
+
**Grouping:**
155
+
* Use `benchmark.explicit_group()` to group related measurements.
156
+
* Ensure grouped results have the same `result.unit`.
157
+
* Group metadata keys in `Suite.additional_metadata()` should match group prefixes.
158
+
159
+
**Metadata:**
160
+
* Provide `metadata.description` for user understanding.
161
+
* Use `metadata.notes` for implementation details or caveats.
162
+
* Tag with relevant `metadata.tags` for filtering.
163
+
* Set `metadata.range_min`/`range_max` for consistent comparisons when needed.
164
+
165
+
**Stability:**
166
+
* Mark unstable benchmarks with `metadata.unstable` to hide them by default.
167
+
70
168
## Adding New Benchmarks
71
169
72
170
1.**Create Benchmark Class:** Implement a new class inheriting from `benches.base.Benchmark`. Implement required methods (`setup`, `run`, `teardown`, `name`) and optional ones (`description`, `get_tags`, etc.) as needed.
@@ -84,6 +182,14 @@ The suite is structured around three main components: Suites, Benchmarks, and Re
84
182
***Use unique names:** Ensure `benchmark.name()` and `result.label` are descriptive and unique.
85
183
***Group related results:** Use `benchmark.explicit_group()` consistently for results you want to compare directly in outputs. Ensure units match within a group. If defining group-level metadata in the Suite, ensure the chosen explicit_group name starts with the corresponding key defined in additional_metadata.
86
184
***Test locally:** Before submitting changes, test with relevant drivers/backends (e.g., using `--compute-runtime --build-igc` for L0). Check the visualization locally if possible (--output-markdown --output-html, then open the generated files).
185
+
***Test dashboard visualization:** When adding new benchmarks, always generate and review the HTML dashboard to ensure:
186
+
* Chart titles and labels are clear and readable.
187
+
* Results are grouped logically using `explicit_group()`.
* Y-axis ranges are appropriate (consider setting `range_min`/`range_max` if needed).
190
+
* Filtering by suite and tags works as expected.
191
+
* Time-series trends make sense for historical data.
192
+
***Tip**: Use `--dry-run --output-html local` to regenerate the dashboard without re-running benchmarks. This uses existing historical data and is useful for testing metadata changes, new groupings, or dashboard improvements.
0 commit comments