Skip to content

Commit 1458127

Browse files
authored
Merge pull request dolthub#2542 from dolthub/gitbook-dev
update perf and pickup outstanding dev changes
2 parents d1b149d + a0f0040 commit 1458127

File tree

3 files changed

+57
-86
lines changed

3 files changed

+57
-86
lines changed

packages/dolt/content/reference/sql/benchmarks/correctness.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ AND col3 IN (3,9,0))))) OR col4 <= 4.25 OR ((col3 = 5))) OR (((col0 >
5454
0)) AND col0 > 6 AND (col4 >= 6.56)))
5555
```
5656

57-
Here are Dolt's sqllogictest results for version `1.51.1`. Tests that
57+
Here are Dolt's sqllogictest results for version `1.51.2`. Tests that
5858
did not run could not complete due to a timeout earlier in the run.
5959
<!-- START___DOLT___CORRECTNESS_RESULTS_TABLE -->
6060
| Results | Count |

packages/dolt/content/reference/sql/benchmarks/latency.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -34,39 +34,39 @@ attempt to run as many queries as possible in a fixed 2 minute time
3434
window. The `Dolt` and `MySQL` columns show the median latency in
3535
milliseconds (ms) of each query during that 2 minute time window.
3636

37-
The Dolt version is `1.51.1`.
37+
The Dolt version is `1.51.2`.
3838

3939
<!-- START___DOLT___LATENCY_RESULTS_TABLE -->
4040
| Read Tests | MySQL | Dolt | Multiple |
4141
|-------------------------|-------|-------|----------|
42-
| covering\_index\_scan | 1.93 | 0.67 | 0.35 |
43-
| groupby\_scan | 13.46 | 17.95 | 1.33 |
44-
| index\_join | 1.47 | 2.43 | 1.65 |
42+
| covering\_index\_scan | 1.89 | 0.65 | 0.34 |
43+
| groupby\_scan | 13.46 | 17.63 | 1.31 |
44+
| index\_join | 1.47 | 2.39 | 1.63 |
4545
| index\_join\_scan | 1.44 | 1.44 | 1.0 |
46-
| index\_scan | 34.95 | 30.81 | 0.88 |
47-
| oltp\_point\_select | 0.18 | 0.27 | 1.5 |
48-
| oltp\_read\_only | 3.49 | 5.28 | 1.51 |
49-
| select\_random\_points | 0.34 | 0.61 | 1.79 |
46+
| index\_scan | 34.33 | 30.81 | 0.9 |
47+
| oltp\_point\_select | 0.18 | 0.26 | 1.44 |
48+
| oltp\_read\_only | 3.43 | 5.18 | 1.51 |
49+
| select\_random\_points | 0.33 | 0.59 | 1.79 |
5050
| select\_random\_ranges | 0.37 | 0.62 | 1.68 |
51-
| table\_scan | 34.95 | 31.37 | 0.9 |
52-
| types\_table\_scan | 75.82 | 116.8 | 1.54 |
53-
| reads\_mean\_multiplier | | | 1.28 |
51+
| table\_scan | 34.95 | 32.53 | 0.93 |
52+
| types\_table\_scan | 75.82 | 134.9 | 1.78 |
53+
| reads\_mean\_multiplier | | | 1.3 |
5454

5555
| Write Tests | MySQL | Dolt | Multiple |
5656
|--------------------------|-------|-------|----------|
5757
| oltp\_delete\_insert | 8.9 | 6.32 | 0.71 |
58-
| oltp\_insert | 4.1 | 3.13 | 0.76 |
59-
| oltp\_read\_write | 9.06 | 11.65 | 1.29 |
58+
| oltp\_insert | 4.1 | 3.07 | 0.75 |
59+
| oltp\_read\_write | 8.9 | 11.45 | 1.29 |
6060
| oltp\_update\_index | 4.18 | 3.19 | 0.76 |
61-
| oltp\_update\_non\_index | 4.18 | 3.13 | 0.75 |
62-
| oltp\_write\_only | 5.77 | 6.32 | 1.1 |
63-
| types\_delete\_insert | 8.43 | 6.67 | 0.79 |
61+
| oltp\_update\_non\_index | 4.18 | 3.07 | 0.73 |
62+
| oltp\_write\_only | 5.67 | 6.32 | 1.11 |
63+
| types\_delete\_insert | 8.28 | 6.67 | 0.81 |
6464
| writes\_mean\_multiplier | | | 0.88 |
6565

66-
| TPC-C TPS Tests | MySQL | Dolt | Multiple |
67-
|-----------------------|-------|-------|----------|
68-
| tpcc-scale-factor-1 | 96.16 | 39.18 | 2.45 |
69-
| tpcc\_tps\_multiplier | | | 2.45 |
66+
| TPC-C TPS Tests | MySQL | Dolt | Multiple |
67+
|-----------------------|-------|------|----------|
68+
| tpcc-scale-factor-1 | 97.8 | 40.1 | 2.44 |
69+
| tpcc\_tps\_multiplier | | | 2.44 |
7070

7171
| Overall Mean Multiple | 1.54 |
7272
|-----------------------|------|

packages/dolt/content/reference/sql/sql-support/miscellaneous.md

Lines changed: 36 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,6 @@ SELECT * from information_schema.tables;
8888
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
8989
```
9090

91-
Statistics are persisted in database's chunk store in a `refs/stats` ref stored separately from the commit graph. Each database has its own statistics store. The contents of the `refs/stats` reflect a single point-in-time for a single branch and are un-versioned. The contents of this ref in the current database can be inspected with the `dolt_statistics` system table.
92-
9391
```sql
9492
create table horses (id int primary key, name varchar(10), key(name));
9593
insert into horses select x, 'Steve' from (with recursive inputs(x) as (select 1 union select x+1 from inputs where x < 1000) select * from inputs) dt;
@@ -111,90 +109,63 @@ select `index`, `position`, row_count, distinct_count, columns, upper_bound, upp
111109
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
112110
```
113111

114-
### Auto-Refresh
115-
116-
Static statistics become stale quickly for tables that change frequently. Users can choose to manually manage run `ANALYZE` statements, or use some form of auto-refresh.
117-
118-
Auto-refresh statistic updates work the same way as partial `ANALYZE` updates. A table's "former" and "new" chunk set will 1) share common chunks preexisting in "former" 2) differ by deleted chunks only in the "former" table, and 3) differ by new chunks in the "new" table. This mirrors Dolt's inherent structural sharing. Rather than forcing an update on every refresh interval, we can toggle how many changes triggers the update.
119-
120-
When the auto-refresh threshold is 0%, the auto-refresh thread behaves like a cron job that runs `ANALYZE` periodically.
121-
122-
Setting a non-zero threshold defers updates until after a certain fraction of chunks are edited. For example, a 100% difference threshold updates stats when:
123-
124-
1) The table was previously empty and now contains data.
125-
126-
2) The table grew or shrank such that the tree height grew or shrank, and therefore the target fanout level changed.
127-
128-
3) Inserts added twice as many chunks.
129-
130-
4) Deletes removed 100% of the preexisting chunks.
112+
### Disable
131113

132-
5) 50% of the chunks were edited (an in-place edit deletes one chunk and adds one chunk, for a total of two changes relative to the original chunk)
133-
134-
Any combination of edits/inserts/deletes that exceeds the trigger threshold will also update stats.
135-
136-
We enable refresh with one mandatory and two optional system variables:
114+
Some workloads, like batch imports, perform strictly better without the overhead of statistics collection. In these cases, we can explicitly stop or purge (stop + delete) statistics on a running server:
137115

138116
```sql
139-
dolt sql -q "set @@PERSIST.dolt_stats_auto_refresh_enabled = 1;"
140-
dolt sql -q "set @@PERSIST.dolt_stats_auto_refresh_interval = 120;"
141-
dolt sql -q "set @@PERSIST.dolt_stats_auto_refresh_threshold = 0.5"
117+
call dolt_stats_stop();
118+
call dolt_stats_purge();
142119
```
143120

144-
The first enables auto-refresh. It is a global variable that must be set during `dolt sql-server` startup and affects all databases in a server context. Databases added or dropped to a running server automatically opt-in to statistics refresh if enabled.
145-
146-
The second two variables configure 1) how often a timer wakes up to check stats freshness (seconds), and 2) the threshold updating a table's active statistics (new+deleted/previous chunks as a percentage between 0-1). For example, `dolt_stats_auto_refresh_interval = 600` means the server only attempt to update stats every 10 minutes, regardless of how much a table has changed. Setting `dolt_stats_auto_refresh_threshold = 0` forces stats to update in response to any table change.
147-
148-
A last variable blocks statistics from loading from disk on startup, or writing to disk on ANALYZE:
121+
A stopped-stats server can be restarted, or have a single collection cycle performed by an operator:
149122

150123
```sql
151-
dolt sql -q "set @@PERSIST.dolt_stats_memory_only = 1"
124+
call dolt_stats_starts();
125+
call dolt_stats_once();
152126
```
153127

154-
### Stats Controller Functions
128+
An environment variable can disable statistics on server reboots:
155129

156-
Dolt exposes a set of helper functions for managing statistics collection and use:
130+
```sql
131+
on version 1.51.0 or higher
132+
SET @@PERSIST.dolt_stats_enabled = 0;
157133

158-
- `dolt_stats_drop()`: Deletes the stats ref on disk and wipes the database stats held in memory for the current database.
134+
— up to 1.50.x
135+
SET @@PERSIST.dolt_stats_auto_refresh_enabled = 0;
136+
```
159137

160-
- `dolt_stats_stop()`: Cancels active auto-refresh threads for the current database.
138+
A rebooted server with stats turned off has no reversal mechanism at the moment. All stats operations are no-ops
139+
if a server starts with the above variables set.
161140

162-
- `dolt_stats_restart()`: Stops and restarts a refresh thread for the current database with the current session's interval and threshold variables.
141+
### Auto-Refresh
163142

164-
- `dolt_stats_status()`: Returns the latest update to statistics for the current database.
143+
Statistics automatically update for servers by default. Stats are stored in a database in `.dolt/stats` separate from user data. This folder can safely be deleted offline.
165144

166-
- `dolt_stats_prune()`: Garbage collects the statistics cache storage, retaining only
167-
the most recent statistic updates.
145+
Stats throughput can be lowered by raising the the `dolt_stats_job_interval` variable, which indicates the milliseconds of delay between processing steps. The higher the delay and more branches in a database, the longer it will take for statistic updates to materialize. High delays reduce the fraction of runtime resources diverted to managing background statistics.
168146

169-
- `dolt_stats_purge()`: Deletes the old statistics cache from the
170-
filesystem. This can be used to silence warnings from backwards
171-
incompatible upgrades. Statistics will need
172-
to be recollected, which can be time consuming.
147+
Stats can be disabled with the `dolt_stats_enabled=0` variable.
173148

174-
### Performance
149+
Stats persistence can be disabled with the `dolt_stats_memory_only=1` variable.
175150

176-
Lowering check intervals and update thresholds increases the refresh read and write load. Refreshing statistics uses shortcuts to avoid reading from disk when possible, but in most cases at least needs to read the target fanout level of the tree from disk to compare previous and current chunk sets. Exceeding the refresh threshold reads all data from disk associated with the new chunk ranges, which will be the most expensive impact of auto-refresh. Dolt uses ordinal offsets to avoid reading unnecessary data, but the tree growing or shrinking by a level forces a full tablescan.
151+
### Stats Garbage Collection
177152

178-
For example, setting the check interval to 0 seconds (constant), the update threshold to 0 (any change triggers refresh) reduces the `oltp_read_write` sysbench benchmark's throughput by 15%. An increase in the update threshold for a 0-interval reduces throughput even more. On the other hand, basically any non-zero interval reduces the fraction of time spent performing stats updates to a negligible level:
153+
The stats in-memory cache accumulates new histograms proportionally to the write rate and stats update rate. Periodically, an
154+
update cycle will swap the currently active histogram buckets to a new in-memory map and clear the old set.
179155

180-
| interval(s) | threshold(%) | latency |
181-
|------------|---------------|----------|
182-
| 0 | 0 | -15% |
183-
| 0 | 1 | -46% |
184-
| 0 | 10 | -45% |
185-
| 1 | 0 | -.1% |
186-
| 1 | 1 | 0% |
156+
Stats garbage collection can be disabled with the `dolt_stats_gc_enabled=0` variable.
187157

188-
A small set of TPC-C run with one thread has a similar pattern compared to the baseline values, comparing queries per second (qps) now:
158+
Garbage collection frequency can be tuned with the `dolt_stats_gc_interval` variable (default 1 hour).
189159

190-
| interval(s) | threshold(%) | qps |
191-
|-------------|--------------|------|
192-
| 0 | 0 | -15% |
193-
| 0 | 1 | -26% |
194-
| 0 | 10 | -10% |
195-
| 1 | 0 | -4% |
196-
| 1 | 1 | 0% |
160+
### Stats Controller Functions
197161

198-
Statistics' usefulness is rarely improved by immediate updates. Updating every minute or hour is probably fine for most workloads. If you do need quick statistics updates, performing them immediately instead of in batches appears to be preferable with the current implementation tradeoffs.
162+
Dolt exposes a set of helper procedures for managing statistics collection and use:
199163

200-
Statistics also have read performance implications, expensing more compute cycles to obtain better join cost estimates. Histograms with the maximum bucket fanout will be the most expensive to use. That said, at the time of writing this sysbench read benchmarks are not impacted by stats estimate overhead. Behavior for custom workloads will depend on read/write/freshness trade-offs.
164+
- `dolt_stats_stop`: clear queue and disable thread
165+
- `dolt_stats_restart`: clear queue, refresh queue, start thread
166+
- `dolt_stats_purge`: clear queue, refresh queue, clear cache disable thread
167+
- `dolt_stats_once`: collect statistics once, ex: in sql-shell
168+
- `dolt_stats_wait`: block on a full queue cycle
169+
- `dolt_stats_gc`: block waiting for a GC signal
170+
- `dolt_stats_flush`: block waiting for a flush signal
171+
- `dolt_stats_info`: print the current state of the stats provider (optional `'-short'` flag)

0 commit comments

Comments
 (0)