Skip to content

Commit 8b5aeac

Browse files
committed
Update database documentation
1 parent 0fcb3f8 commit 8b5aeac

File tree

1 file changed

+183
-62
lines changed

1 file changed

+183
-62
lines changed

database/schema.md

Lines changed: 183 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,19 @@
22

33
Below is an explanation of the current database schema. This schema is duplicated across the (currently) two database backends we support: sqlite and postgres.
44

5-
65
## Overview
76

87
In general, the database is used to track three groups of things:
9-
* Performance run statistics (e.g., instruction count) on a per benchmark, profile, and cache-state basis.
8+
* Performance run statistics (e.g., instruction count) for compile time benchmarks on a per benchmark, profile, and scenario basis.
9+
* Performance run statistics (e.g., instruction count) for runtime benchmarks on a per benchmark basis.
1010
* Self profile data gathered with `-Zself-profile`.
11-
* State when running GitHub bots and the performance runs (e.g., how long it took for a performance suite to run, errors encountered a long the way, etc.)
11+
* State when running GitHub bots and the performance runs (e.g., how long it took for a performance suite to run, errors encountered along the way, etc.)
1212

1313
Below are some diagrams showing the basic layout of the database schema for these three uses:
1414

1515
### Performance run statistics
1616

17+
Here is the diagram for compile-time benchmarks:
1718
```
1819
┌────────────┐ ┌───────────────┐ ┌────────────┐
1920
│ benchmark │ │ collection │ │ artifact │
@@ -36,132 +37,252 @@ Below are some diagrams showing the basic layout of the database schema for thes
3637
└───────────────┘ └──────────┘
3738
```
3839

39-
### Self profile data
40-
41-
**TODO**
42-
43-
### Miscellaneous State
44-
45-
**TODO**
40+
For runtime benchmarks the schema very similar, but there are different table names:
41+
- `benchmark` => `runtime_benchmark`
42+
- `pstat` => `runtime_pstat`
43+
- `pstat_series` => `runtime_pstat_series`
44+
- There are different attributes here, `benchmark` and `metric`.
4645

4746
## Tables
4847

49-
### benchmark
50-
51-
The different types of benchmarks that are run.
52-
53-
The table stores the name of the benchmark as well as whether it is capable of being run using the stable compiler. The benchmark name is used as a foreign key in many of the other tables.
54-
55-
```
56-
sqlite> select * from benchmark limit 1;
57-
name stabilized
58-
---------- ----------
59-
helloworld 0
60-
```
61-
6248
### artifact
6349

64-
A description of a rustc compiler artifact being benchmarked.
50+
A description of a rustc compiler artifact being benchmarked.
6551

6652
This description includes:
6753
* name: usually a commit sha or a tag like "1.51.0" but is free-form text so can be anything.
68-
* date: the date associated with this compiler artifact (usually only when the name is a commit)
54+
* date: the date associated with this compiler artifact (usually only when the name is a commit)
6955
* type: currently one of "master" (i.e., we're testing a merge commit), "try" (someone is testing a PR), and "release" (usually a release candidate - though local compilers also get labeled like this).
7056

7157
```
7258
sqlite> select * from artifact limit 1;
73-
id name date type
74-
---------- ---------- ---------- ----------
75-
1 LOCAL_TEST release
59+
id name date type
60+
---------- ---------- ---------- -------
61+
1 LOCAL_TEST release
7662
```
7763

7864
### collection
7965

8066
A "collection" of benchmarks tied only differing by the statistic collected.
8167

82-
This is a way to collect statistics together signifying that they belong to the same logical benchmark run.
68+
This is a way to collect statistics together signifying that they belong to the same logical benchmark run.
8369

84-
Currently the collection also marks the git sha of the currently running collector binary.
70+
Currently, the collection also marks the git sha of the currently running collector binary.
8571

8672
```
8773
sqlite> select * from collection limit 1;
88-
id perf_commit
89-
---------- -----------------------------------------
74+
id perf_commit
75+
---------- ----------------------------------------
9076
1 d9fd96f409a15429757030f225b082744a72516c
9177
```
9278

79+
### collector_progress
80+
81+
Keeps track of the collector's start and finish time as well as which step it's currently on.
82+
83+
```
84+
sqlite> select * from collector_progress limit 1;
85+
aid step start end
86+
---------- ---------- ---------- ----------
87+
1 helloworld 1625829961 1625829965
88+
```
89+
90+
### artifact_collection_duration
91+
92+
Records how long benchmarking takes in seconds.
93+
94+
```
95+
sqlite> select * from artifact_collection_duration limit 1;
96+
aid date_recorded duration
97+
---------- ------------- ----------
98+
1 1625829965 4
99+
```
100+
101+
### benchmark
102+
103+
The different types of compile-time benchmarks that are run.
104+
105+
The table stores the name of the benchmark as well as whether it is capable of being run using the stable compiler and what is its category.
106+
The benchmark name is used as a foreign key in many of the other tables.
107+
108+
Category is either `primary` (real-world benchmark) or `secondary` (stress test).
109+
Stable benchmarks have `category` set to `primary` and `stabilized` set to `1`.
110+
111+
```
112+
sqlite> select * from runtime_benchmark limit 1;
113+
name stabilized category
114+
---------- ---------- ----------
115+
helloworld 0 primary
116+
```
117+
93118
### pstat_series
94119

95-
A unique collection of crate, profile, cache and statistic.
120+
Describes the parametrization of a compile-time benchmark. Contains a unique combination
121+
of a crate, profile, scenario and the metric being collected.
96122

97-
* crate: the benchmarked crate which might be a crate from crates.io or a crate made specifically to stress some part of the compiler.
123+
* crate (aka `benchmark`): the benchmarked crate which might be a crate from crates.io or a crate made specifically to stress some part of the compiler.
98124
* profile: what type of compilation is happening - check build, optimized build (a.k.a. release build), debug build, or doc build.
99-
* cache: how much of the incremental cache is full. An empty incremental cache means that the compiler must do a full build.
100-
* statistic: the type of stat being collected
125+
* cache (aka `scenario`): describes how much of the incremental cache is full. An empty incremental cache means that the compiler must do a full build.
126+
* statistic (aka `metric`): the type of metric being collected
127+
128+
There is a separate table for this collection to avoid duplicating crates, prfiles, scenarios etc.
129+
many times in the `pstat` table.
101130

102131
```
103132
sqlite> select * from pstat_series limit 1;
104-
id crate profile cache statistic
133+
id crate profile cache statistic
105134
---------- ---------- ---------- ---------- ------------
106135
1 helloworld check full task-clock:u
107136
```
108137

109138
### pstat
110139

111-
A statistic that is unique to a pstat_series, artifact and collection.
140+
A measured value of a compile-time metric that is unique to a `pstat_series`, `artifact` and a `collection`.
112141

113-
This stat is unique across a benchmarked crate, profile, cache state, statistic, rustc artifact, and benchmarks "collection".
142+
Each measured combination of a collection, rustc artifact, benchmarked crate, profile, scenario and a metric
143+
has its own unique entry in this table.
114144

115145
```
116146
sqlite> select * from pstat limit 1;
117-
series aid cid value
147+
series aid cid value
118148
---------- ---------- ---------- ----------
119-
1 1 1 24.93
149+
1 1 1 24.93
120150
```
121151

152+
### runtime_benchmark
122153

123-
### self_profile_query_series
154+
The different types of runtime benchmarks that are run.
124155

125-
**TODO**
156+
The table currently stores only the name of the benchmark.
126157

127-
### self_profile_query
158+
```
159+
sqlite> select * from runtime_benchmark limit 1;
160+
name
161+
---------
162+
nbody-10k
163+
```
128164

129-
**TODO**
165+
### runtime_pstat_series
130166

131-
### pull_request_build
167+
Describes the parametrization of a runtime benchmark. Contains a unique combination
168+
of a benchmark and the metric being collected.
132169

133-
**TODO**
170+
This table exists to avoid duplicating crates, profiles, scenarios etc. many times in the `runtime_pstat` table.
134171

135-
### artifact_collection_duration
172+
```
173+
sqlite> select * from runtime_pstat_series limit 1;
174+
id benchmark metric
175+
---------- --------- --------------
176+
1 nbody-10k instructions:u
177+
```
136178

137-
Records how long benchmarking takes in seconds.
179+
### runtime_pstat
180+
181+
A measured value of a runtime metric that is unique to a `runtime_pstat_series`, `artifact` and a `collection`.
182+
183+
Each measured combination of a collection, rustc artifact, benchmark and a metric
184+
has its own unique entry in this table.
138185

139186
```
140-
sqlite> select * from artifact_collection_duration limit 1;
141-
aid date_recorded duration
142-
---------- ------------- ----------
143-
1 1625829965 4
187+
sqlite> select * from runtime_pstat limit 1;
188+
series aid cid value
189+
---------- ---------- ---------- ----------
190+
1 1 1 24.93
144191
```
145192

146-
### collector_progress
193+
### self_profile_query_series
147194

148-
Keeps track of the collector's start and finish time as well as which step it's currently on.
195+
Describes a parametrization of a self-profile query. Contains a unique combination
196+
of a benchmark, profile, scenario and a `rustc` self-profile query.
197+
198+
This table exists to avoid duplicating benchmarks, profiles, scenarios etc. many times in the `self_profile_query` table.
149199

150200
```
151-
sqlite> select * from collector_progress limit 1;
152-
aid step start end
153-
---------- ---------- ---------- ----------
154-
1 helloworld 1625829961 1625829965
201+
sqlite> select * from runtime_pstat limit 1;
202+
id crate profile cache query
203+
-- ----- ------- ---------- -----
204+
1 hello-world debug full hir_crate
205+
```
206+
207+
### self_profile_query
208+
209+
A measured value of a single `rustc` self-profile query that is unique to a `self_profile_query_series`, `artifact` and a `collection`.
210+
211+
```
212+
sqlite> select * from runtime_pstat limit 1;
213+
series aid cid self_time blocked_time incremental_load_time number_of_cache_hits invocation_count
214+
-- ----- --- --------- ------------ --------------------- -------------------- ----------------
215+
1 42 58 11.8 10.2 8.4 224 408
155216
```
156217

157218
### rustc_compilation
158219

159-
**TODO**
220+
Records the duration of compiling a `rustc` crate for a given artifact and collection.
221+
222+
```
223+
sqlite> select * from runtime_pstat limit 1;
224+
aid cid crate duration
225+
--- --- ---------- --------
226+
1 42 rustc_mir_transform 28.096
227+
```
228+
229+
### raw_self_profile
230+
231+
Records that a given combination of artifact, collection, benchmark, profile and scenario
232+
has a self profile archive available. This profile is then downloaded through an endpoint -
233+
it is not stored in the database directly.
234+
235+
```
236+
sqlite> select * from raw_self_profile limit 1;
237+
aid cid crate profile cache
238+
--- --- ----- ------- -----
239+
1 42 hello-world debug full
240+
```
241+
242+
### pull_request_build
243+
244+
Records a pull request commit that is waiting in a queue to be benchmarked.
245+
246+
First a merge commit is queued, then its artifacts are built by bors, and once the commit
247+
is attached to the entry in this table, it can be benchmarked.
248+
249+
* bors_sha: SHA of the commit that should be benchmarked
250+
* pr: number of the PR
251+
* parent_sha: SHA of the parent commit, to which will the PR be compared
252+
* complete: bool specifying whether this commit has been already benchmarked or not
253+
* requested: when was the commit queued
254+
* include: which benchmarks should be included (corresponds to the `--include` benchmark parameter)
255+
* exclude: which benchmarks should be excluded (corresponds to the `--exclude` benchmark parameter)
256+
* runs: how many iterations should be used by default for the benchmark run
257+
* commit_date: when was the commit created
258+
259+
```
260+
sqlite> select * from pull_request_build limit 1;
261+
bors_sha pr parent_sha complete requested include exclude runs commit_date
262+
---------- -- ---------- -------- --------- ------- ------- ---- -----------
263+
1w0p83... 42 fq24xq... true <timestamp> 3 <timestamp>
264+
```
160265

161266
### error_series
162267

163-
**TODO**
268+
Records a compile-time benchmark that caused an error.
269+
270+
This table exists to avoid duplicating benchmarks many times in the `error` table.
271+
272+
```
273+
sqlite> select * from error_series limit 1;
274+
id crate
275+
---------- -----------
276+
1 hello-world
277+
```
164278

165279
### error
166280

167-
**TODO**
281+
Records a compilation error for an artifact and an entry in `error_series`.
282+
283+
```
284+
sqlite> select * from error limit 1;
285+
series aid error
286+
---------- --- -----
287+
1 42 Failed to compile...
288+
```

0 commit comments

Comments
 (0)