Skip to content

Commit da5432c

Browse files
authored
[CI][Benchmarks] Add dashboard info in contrib (#19549)
Add information on the dashboard generation in the contribution guide. Preview: https://github.com/PatKamin/llvm/blob/contrib-guide/devops/scripts/benchmarks/CONTRIB.md
1 parent 69a9503 commit da5432c

File tree

1 file changed

+106
-0
lines changed

1 file changed

+106
-0
lines changed

devops/scripts/benchmarks/CONTRIB.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,104 @@ The suite is structured around three main components: Suites, Benchmarks, and Re
6767
* `display_name`: Optional user-friendly name for the benchmark (string). Defaults to `name()`.
6868
* `explicit_group`: Optional explicit group name for results (string). Used to group results in visualizations.
6969

70+
## Dashboard and Visualization
71+
72+
The benchmark suite generates an interactive HTML dashboard that visualizes `Result` objects and their metadata.
73+
74+
### Data Flow from Results to Dashboard
75+
76+
1. **Collection Phase:**
77+
* Benchmarks generate `Result` objects containing performance measurements.
78+
* The framework combines these with `BenchmarkMetadata` from benchmarks and suites. The metadata
79+
is used for defining charts.
80+
* All data is packaged into a `BenchmarkOutput` object containing runs, metadata, and tags for serialization.
81+
82+
2. **Serialization:**
83+
* For local viewing (`--output-html local`): Data is written as JavaScript variables in `data.js`.
84+
These are directly loaded in the HTML dashboard.
85+
* For remote deployment (`--output-html remote`): Data is written as JSON in `data.json`.
86+
The `config.js` file contains the URL where the json file is hosted.
87+
* Historical runs may be separated into archive files for better dashboard load times.
88+
89+
3. **Dashboard Rendering:**
90+
* JavaScript processes the data to create three chart types:
91+
* **Historical Results**: Time-series charts showing performance trends over multiple runs. One chart for each unique benchmark scenario.
92+
* **Historical Layer Comparisons**: Time-series charts for grouped results. Benchmark scenarios that can be directly compared are grouped either by using `explicit_group()` or matching the beginning of their labels with predefined groups.
93+
* **Comparisons**: Bar charts comparing selected runs side-by-side. Again, based on the `explicit_group()` or labels.
94+
95+
### Chart Types and Result Mapping
96+
97+
**Historical Results (Time-series):**
98+
* One chart per unique `result.label`.
99+
* X-axis: `BenchmarkRun.date` (time).
100+
* Y-axis: `result.value` with `result.unit`.
101+
* Multiple lines for different `BenchmarkRun.name` entries.
102+
* Points include `result.stddev`, `result.git_hash`, and environment info in tooltips.
103+
104+
**Historical Layer Comparisons:**
105+
* Groups related results using `benchmark.explicit_group()` or `result.label` prefixes.
106+
* Useful for comparing different implementations/configurations of the same benchmark.
107+
* Same time-series format but with grouped data.
108+
109+
**Comparisons (Bar charts):**
110+
* Compares selected runs side-by-side.
111+
* X-axis: `BenchmarkRun.name`.
112+
* Y-axis: `result.value` with `result.unit`.
113+
* One bar per selected run.
114+
115+
### Dashboard Features Controlled by Results/Metadata
116+
117+
**Visual Properties:**
118+
* **Chart Title**: `metadata.display_name` or `result.label`.
119+
* **Y-axis Range**: `metadata.range_min` and `range_max` (when custom ranges enabled).
120+
* **Direction Indicator**: `result.lower_is_better` (shows "Lower/Higher is better").
121+
* **Grouping**: `benchmark.explicit_group()` groups related results together.
122+
123+
**Filtering and Organization:**
124+
* **Suite Filters**: Filter by `result.suite`.
125+
* **Tag Filters**: Filter by `metadata.tags`.
126+
* **Regex Search**: Search by `result.label` patterns, `metadata.display_name` patterns are not searchable.
127+
* **Stability**: Hide/show based on `metadata.unstable`.
128+
129+
**Information Display:**
130+
* **Description**: `metadata.description` appears prominently above charts.
131+
* **Notes**: `metadata.notes` provides additional context (toggleable).
132+
* **Tags**: `metadata.tags` displayed as colored badges with descriptions.
133+
* **Command Details**: Shows `result.command` and `result.env` in expandable sections.
134+
* **Git Information**: `result.git_url` and `result.git_hash` for benchmark source tracking.
135+
136+
### Dashboard Interaction
137+
138+
**Run Selection:**
139+
* Users select which `BenchmarkRun.name` entries to compare.
140+
* Default selection uses `BenchmarkOutput.default_compare_names`.
141+
* Changes affect all chart types simultaneously.
142+
143+
**URL State Preservation:**
144+
* All filters, selections, and options are preserved in URL parameters.
145+
* Enables sharing specific dashboard views via URL address copy.
146+
147+
### Best Practices for Dashboard-Friendly Results
148+
149+
**Naming:**
150+
* Use unique `result.label` names that will be most descriptive.
151+
* Consider `metadata.display_name` for prettier chart titles.
152+
* Ensure `benchmark.name()` is unique across all suites.
153+
154+
**Grouping:**
155+
* Use `benchmark.explicit_group()` to group related measurements.
156+
* Ensure grouped results have the same `result.unit`.
157+
* Group metadata keys in `Suite.additional_metadata()` should match group prefixes.
158+
159+
**Metadata:**
160+
* Provide `metadata.description` for user understanding.
161+
* Use `metadata.notes` for implementation details or caveats.
162+
* Tag with relevant `metadata.tags` for filtering.
163+
* Set `metadata.range_min`/`range_max` for consistent comparisons when needed.
164+
165+
**Stability:**
166+
* Mark unstable benchmarks with `metadata.unstable` to hide them by default.
167+
70168
## Adding New Benchmarks
71169

72170
1. **Create Benchmark Class:** Implement a new class inheriting from `benches.base.Benchmark`. Implement required methods (`setup`, `run`, `teardown`, `name`) and optional ones (`description`, `get_tags`, etc.) as needed.
@@ -84,6 +182,14 @@ The suite is structured around three main components: Suites, Benchmarks, and Re
84182
* **Use unique names:** Ensure `benchmark.name()` and `result.label` are descriptive and unique.
85183
* **Group related results:** Use `benchmark.explicit_group()` consistently for results you want to compare directly in outputs. Ensure units match within a group. If defining group-level metadata in the Suite, ensure the chosen explicit_group name starts with the corresponding key defined in additional_metadata.
86184
* **Test locally:** Before submitting changes, test with relevant drivers/backends (e.g., using `--compute-runtime --build-igc` for L0). Check the visualization locally if possible (--output-markdown --output-html, then open the generated files).
185+
* **Test dashboard visualization:** When adding new benchmarks, always generate and review the HTML dashboard to ensure:
186+
* Chart titles and labels are clear and readable.
187+
* Results are grouped logically using `explicit_group()`.
188+
* Metadata (description, notes, tags) displays correctly.
189+
* Y-axis ranges are appropriate (consider setting `range_min`/`range_max` if needed).
190+
* Filtering by suite and tags works as expected.
191+
* Time-series trends make sense for historical data.
192+
* **Tip**: Use `--dry-run --output-html local` to regenerate the dashboard without re-running benchmarks. This uses existing historical data and is useful for testing metadata changes, new groupings, or dashboard improvements.
87193

88194
## Utilities
89195

0 commit comments

Comments
 (0)