Skip to content

Commit d45fd47

Browse files
committed
Revise the part on the upsert benchmarks
1 parent 592a11e commit d45fd47

File tree

1 file changed

+47
-46
lines changed

1 file changed

+47
-46
lines changed

doc/final-report/final-report.md

Lines changed: 47 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1355,56 +1355,57 @@ assurance that the actual implementation is correct. This is important in
13551355
general but especially so for the pipelined implementation, which is
13561356
non-trivial.
13571357
1358-
## The "Upsert" Benchmark
1358+
## The upsert benchmarks
13591359
1360-
Performance requirement 6 states:
1360+
Item 6 of the performance requirements states the following:
13611361
13621362
> A benchmark should demonstrate that the performance characteristics of the
13631363
> monoidial update operation should be similar to that of the insert or delete
1364-
> operations, and substantially better than the combination of a lookup
1365-
> followed by an insert.
1366-
1367-
The `lsm-tree` library and documentation now uses the term "upsert" for this
1368-
monoidal update operation, to follow standard database terminology.
1369-
1370-
Based on the requirement above there are two (pairs of) benchmarks:
1371-
1372-
1. A benchmark of the time to insert a (large) number of (batches of) key-value
1373-
pairs; and a benchmark of the time to upsert the same sequence of
1374-
key-value pairs.
1375-
1376-
2. A benchmark of the time to repeatedly upsert values for a set of keys (so
1377-
each key is updated several times); and a benchmark of the time to
1378-
repeatedly update the same keys by the combination of lookup and insert
1379-
(with accumulation). This uses lookups and inserts in batches. The set of
1380-
keys is looked up, and the existing values are combined with the new values.
1381-
The same set of key-value pairs are used 10 times, so that there are 10
1382-
updates per key (either lookup and insert, or upsert).
1383-
1384-
Each benchmark uses:
1385-
1386-
* 64bit keys and values
1387-
* values are combined using addition;
1388-
* 80,000 elements (generated using a PRNG),
1389-
* batches of size 250;
1390-
* no disk caching;
1391-
* a write buffer of 1,000 elements.
1392-
1393-
These benchmarks are implemented using `criterion`, which performs multiple
1394-
runs and combines the results in a sound statistical manner. The reported
1395-
variance was relatively low. The benchmarks were executed on the dev laptop
1396-
machine, however the absolute times of these benchmarks is of little interest.
1397-
The interesting point is the relative timings.
1398-
1399-
The result are as follows
1400-
1401-
1. Less than 0.4 % difference in timing between insert and upsert (932.8ms vs
1402-
929.4ms). This clearly qualifies as "similar".
1403-
1404-
2. The combination of lookup and insert takes 2.4 times as long as using upsert
1405-
(2.857s vs 1.188s). We can thus reasonably conclude that the performance of
1406-
upsert is "substantially better" than the combination of a lookup followed
1407-
by an insert.
1364+
> operations, and substantially better than the combination of a lookup followed
1365+
> by an insert.
1366+
1367+
As already mentioned in [the discussion on functional
1368+
requirement 4](#requirement-4), the `lsm-tree` library and its documentation now
1369+
use the term ‘upsert’ for this monoidal update operation, to follow standard
1370+
database terminology.
1371+
1372+
In line with the above requirement, we have created the following benchmarks:
1373+
1374+
* A benchmark of the time to insert a large number of key–value pairs using the
1375+
insert operation and a benchmark of the time to insert the same key–value
1376+
pairs using the upsert operation
1377+
1378+
* A benchmark of the time to repeatedly upsert values of certain keys and a
1379+
benchmark of the time to repeatedly update the values of these keys by looking
1380+
up their current values, modifying them and writing them back
1381+
1382+
The benchmarks use the following parameters:
1383+
1384+
* 64 bit as the size of keys and values
1385+
* 80,000 elements (generated using a PRNG)
1386+
* Addition as the update operation
1387+
* 250 operations per batch
1388+
* No disk caching
1389+
* 1,000 elements as the write buffer capacity
1390+
* 10 updates per key in case of the second two benchmarks
1391+
1392+
The benchmarks are implemented using Criterion, which performs multiple
1393+
benchmark runs and combines the results in a sound statistical manner. For our
1394+
benchmarks, the variance of the results across the different runs, as reported
1395+
by Criterion, is relatively low. We have executed the benchmarks on the dev
1396+
laptop machine. However, the absolute running times of these benchmarks is of
1397+
little interest; the interesting point is the relative timings.
1398+
1399+
The result are as follows:
1400+
1401+
* The difference in running time between the insert and the corresponding upsert
1402+
benchmark is less than 0.4 % (932.8 ms vs. 929.4 ms), so that insert and
1403+
upsert performance clearly qualify as ‘similar’.
1404+
1405+
* Using the combination of lookup and insert takes 2.4 times as long as using
1406+
upsert (2.857 s vs. 1.188 s). We can thus reasonably conclude that the
1407+
performance of upsert is ‘substantially better’ than the performance of a
1408+
lookup followed by an insert.
14081409
14091410
# References {-}
14101411

0 commit comments

Comments
 (0)