Update database documentation

Kobzol · Kobzol · commit 8b5aeac338be · 2023-06-03T18:52:46.000+02:00
diff --git a/database/schema.md b/database/schema.md
@@ -2,18 +2,19 @@
 
 Below is an explanation of the current database schema. This schema is duplicated across the (currently) two database backends we support: sqlite and postgres.
 
-
 ## Overview
 
 In general, the database is used to track three groups of things:
-* Performance run statistics (e.g., instruction count) on a per benchmark, profile, and cache-state basis.
+* Performance run statistics (e.g., instruction count) for compile time benchmarks on a per benchmark, profile, and scenario basis.
+* Performance run statistics (e.g., instruction count) for runtime benchmarks on a per benchmark basis.
 * Self profile data gathered with `-Zself-profile`.
-* State when running GitHub bots and the performance runs (e.g., how long it took for a performance suite to run, errors encountered a long the way, etc.)
+* State when running GitHub bots and the performance runs (e.g., how long it took for a performance suite to run, errors encountered along the way, etc.)
 
 Below are some diagrams showing the basic layout of the database schema for these three uses:
 
 ### Performance run statistics
 
+Here is the diagram for compile-time benchmarks:
 ```
   ┌────────────┐  ┌───────────────┐  ┌────────────┐   
   │ benchmark  │  │ collection    │  │ artifact   │
@@ -36,132 +37,252 @@ Below are some diagrams showing the basic layout of the database schema for thes
   └───────────────┘  └──────────┘
 ```
 
-### Self profile data
-
-**TODO**
-
-### Miscellaneous State
-
-**TODO**
+For runtime benchmarks the schema very similar, but there are different table names:
+- `benchmark` => `runtime_benchmark`
+- `pstat` => `runtime_pstat`
+- `pstat_series` => `runtime_pstat_series`
+  - There are different attributes here, `benchmark` and `metric`.
 
 ## Tables
 
-### benchmark
-
-The different types of benchmarks that are run. 
-
-The table stores the name of the benchmark as well as whether it is capable of being run using the stable compiler.  The benchmark name is used as a foreign key in many of the other tables. 
-
-```
-sqlite> select * from benchmark limit 1;
-name        stabilized
-----------  ----------
-helloworld  0   
-```
-
 ### artifact
 
-A description of a rustc compiler artifact being benchmarked. 
+A description of a rustc compiler artifact being benchmarked.
 
 This description includes:
 * name: usually a commit sha or a tag like "1.51.0" but is free-form text so can be anything.
-* date: the date associated with this compiler artifact (usually only when the name is a commit) 
+* date: the date associated with this compiler artifact (usually only when the name is a commit)
 * type: currently one of "master" (i.e., we're testing a merge commit), "try" (someone is testing a PR), and "release" (usually a release candidate - though local compilers also get labeled like this).
 
 ```
 sqlite> select * from artifact limit 1;
-id          name        date        type      
-----------  ----------  ----------  ----------
-1           LOCAL_TEST              release  
+id          name        date        type   
+----------  ----------  ----------  -------
+1           LOCAL_TEST              release
 ```
 
 ### collection
 
 A "collection" of benchmarks tied only differing by the statistic collected.
 
-This is a way to collect statistics together signifying that they belong to the same logical benchmark run. 
+This is a way to collect statistics together signifying that they belong to the same logical benchmark run.
 
-Currently the collection also marks the git sha of the currently running collector binary.
+Currently, the collection also marks the git sha of the currently running collector binary.
 
 ```
 sqlite> select * from collection limit 1;
-id          perf_commit                              
-----------  -----------------------------------------
+id          perf_commit 
+----------  ----------------------------------------
 1           d9fd96f409a15429757030f225b082744a72516c
 ```
 
+### collector_progress
+
+Keeps track of the collector's start and finish time as well as which step it's currently on.
+
+```
+sqlite> select * from collector_progress limit 1;
+aid         step        start       end
+----------  ----------  ----------  ----------
+1           helloworld  1625829961  1625829965
+```
+
+### artifact_collection_duration
+
+Records how long benchmarking takes in seconds.
+
+```
+sqlite> select * from artifact_collection_duration limit 1;
+aid         date_recorded  duration
+----------  -------------  ----------
+1           1625829965     4
+```
+
+### benchmark
+
+The different types of compile-time benchmarks that are run. 
+
+The table stores the name of the benchmark as well as whether it is capable of being run using the stable compiler and what is its category.
+The benchmark name is used as a foreign key in many of the other tables.
+
+Category is either `primary` (real-world benchmark) or `secondary` (stress test).
+Stable benchmarks have `category` set to `primary` and `stabilized` set to `1`.
+
+```
+sqlite> select * from runtime_benchmark limit 1;
+name        stabilized  category
+----------  ----------  ----------
+helloworld  0           primary
+```
+
 ### pstat_series
 
-A unique collection of crate, profile, cache and statistic.
+Describes the parametrization of a compile-time benchmark. Contains a unique combination
+of a crate, profile, scenario and the metric being collected.
 
-* crate: the benchmarked crate which might be a crate from crates.io or a crate made specifically to stress some part of the compiler.
+* crate (aka `benchmark`): the benchmarked crate which might be a crate from crates.io or a crate made specifically to stress some part of the compiler.
 * profile: what type of compilation is happening - check build, optimized build (a.k.a. release build), debug build, or doc build.
-* cache: how much of the incremental cache is full. An empty incremental cache means that the compiler must do a full build.
-* statistic: the type of stat being collected
+* cache (aka `scenario`): describes how much of the incremental cache is full. An empty incremental cache means that the compiler must do a full build.
+* statistic (aka `metric`): the type of metric being collected
+
+There is a separate table for this collection to avoid duplicating crates, prfiles, scenarios etc.
+many times in the `pstat` table.
 
 ```
 sqlite> select * from pstat_series limit 1;
-id          crate       profile     cache       statistic   
+id          crate       profile     cache       statistic
 ----------  ----------  ----------  ----------  ------------
 1           helloworld  check       full        task-clock:u
 ```
 
 ### pstat
 
-A statistic that is unique to a pstat_series, artifact and collection.
+A measured value of a compile-time metric that is unique to a `pstat_series`, `artifact` and a `collection`.
 
-This stat is unique across a benchmarked crate, profile, cache state, statistic, rustc artifact, and benchmarks "collection".
+Each measured combination of a collection, rustc artifact, benchmarked crate, profile, scenario and a metric
+has its own unique entry in this table.
 
 ```
 sqlite> select * from pstat limit 1;
-series      aid         cid         value     
+series      aid         cid         value
 ----------  ----------  ----------  ----------
-1           1           1           24.93   
+1           1           1           24.93
 ```
 
+### runtime_benchmark
 
-### self_profile_query_series
+The different types of runtime benchmarks that are run.
 
-**TODO**
+The table currently stores only the name of the benchmark.
 
-### self_profile_query
+```
+sqlite> select * from runtime_benchmark limit 1;
+name
+---------
+nbody-10k
+```
 
-**TODO**
+### runtime_pstat_series
 
-### pull_request_build
+Describes the parametrization of a runtime benchmark. Contains a unique combination
+of a benchmark and the metric being collected.
 
-**TODO**
+This table exists to avoid duplicating crates, profiles, scenarios etc. many times in the `runtime_pstat` table.
 
-### artifact_collection_duration
+```
+sqlite> select * from runtime_pstat_series limit 1;
+id          benchmark  metric
+----------  ---------  --------------
+1           nbody-10k  instructions:u
+```
 
-Records how long benchmarking takes in seconds.
+### runtime_pstat
+
+A measured value of a runtime metric that is unique to a `runtime_pstat_series`, `artifact` and a `collection`.
+
+Each measured combination of a collection, rustc artifact, benchmark and a metric
+has its own unique entry in this table.
 
 ```
-sqlite> select * from artifact_collection_duration limit 1;
-aid         date_recorded  duration  
-----------  -------------  ----------
-1           1625829965     4 
+sqlite> select * from runtime_pstat limit 1;
+series      aid         cid         value
+----------  ----------  ----------  ----------
+1           1           1           24.93
 ```
 
-### collector_progress
+### self_profile_query_series
 
-Keeps track of the collector's start and finish time as well as which step it's currently on.
+Describes a parametrization of a self-profile query. Contains a unique combination
+of a benchmark, profile, scenario and a `rustc` self-profile query.
+
+This table exists to avoid duplicating benchmarks, profiles, scenarios etc. many times in the `self_profile_query` table.
 
 ```
-sqlite> select * from collector_progress limit 1;
-aid         step        start       end       
-----------  ----------  ----------  ----------
-1           helloworld  1625829961  1625829965
+sqlite> select * from runtime_pstat limit 1;
+id  crate        profile  cache       query
+--  -----        -------  ----------  -----
+1   hello-world  debug    full        hir_crate
+```
+
+### self_profile_query
+
+A measured value of a single `rustc` self-profile query that is unique to a `self_profile_query_series`, `artifact` and a `collection`.
+
+```
+sqlite> select * from runtime_pstat limit 1;
+series  aid    cid  self_time  blocked_time  incremental_load_time  number_of_cache_hits  invocation_count
+--      -----  ---  ---------  ------------  ---------------------  --------------------  ----------------
+1       42     58   11.8       10.2          8.4                    224                   408
 ```
 
 ### rustc_compilation
 
-**TODO**
+Records the duration of compiling a `rustc` crate for a given artifact and collection.
+
+```
+sqlite> select * from runtime_pstat limit 1;
+aid  cid  crate                duration
+---  ---  ----------           --------
+1    42   rustc_mir_transform  28.096
+```
+
+### raw_self_profile
+
+Records that a given combination of artifact, collection, benchmark, profile and scenario
+has a self profile archive available. This profile is then downloaded through an endpoint -
+it is not stored in the database directly.
+
+```
+sqlite> select * from raw_self_profile limit 1;
+aid  cid  crate        profile  cache
+---  ---  -----        -------  -----
+1    42   hello-world  debug    full
+```
+
+### pull_request_build
+
+Records a pull request commit that is waiting in a queue to be benchmarked.
+
+First a merge commit is queued, then its artifacts are built by bors, and once the commit
+is attached to the entry in this table, it can be benchmarked.
+
+* bors_sha: SHA of the commit that should be benchmarked
+* pr: number of the PR
+* parent_sha: SHA of the parent commit, to which will the PR be compared
+* complete: bool specifying whether this commit has been already benchmarked or not
+* requested: when was the commit queued
+* include: which benchmarks should be included (corresponds to the `--include` benchmark parameter)
+* exclude: which benchmarks should be excluded (corresponds to the `--exclude` benchmark parameter)
+* runs: how many iterations should be used by default for the benchmark run
+* commit_date: when was the commit created
+
+```
+sqlite> select * from pull_request_build limit 1;
+bors_sha    pr  parent_sha  complete  requested    include  exclude  runs  commit_date
+----------  --  ----------  --------  ---------    -------  -------  ----  -----------
+1w0p83...   42  fq24xq...   true      <timestamp>                    3     <timestamp>
+```
 
 ### error_series
 
-**TODO**
+Records a compile-time benchmark that caused an error.
+
+This table exists to avoid duplicating benchmarks many times in the `error` table.
+
+```
+sqlite> select * from error_series limit 1;
+id          crate
+----------  -----------
+1           hello-world
+```
 
 ### error
 
-**TODO**
+Records a compilation error for an artifact and an entry in `error_series`.
+
+```
+sqlite> select * from error limit 1;
+series      aid  error
+----------  ---  -----
+1           42   Failed to compile...
+```