@@ -1355,56 +1355,57 @@ assurance that the actual implementation is correct. This is important in
13551355general but especially so for the pipelined implementation, which is
13561356non-trivial.
13571357
1358- ## The "Upsert" Benchmark
1358+ ## The upsert benchmarks
13591359
1360- Performance requirement 6 states:
1360+ Item 6 of the performance requirements states the following :
13611361
13621362> A benchmark should demonstrate that the performance characteristics of the
13631363> monoidial update operation should be similar to that of the insert or delete
1364- > operations, and substantially better than the combination of a lookup
1365- > followed by an insert.
1366-
1367- The `lsm-tree` library and documentation now uses the term "upsert" for this
1368- monoidal update operation, to follow standard database terminology.
1369-
1370- Based on the requirement above there are two (pairs of) benchmarks:
1371-
1372- 1. A benchmark of the time to insert a (large) number of (batches of) key-value
1373- pairs; and a benchmark of the time to upsert the same sequence of
1374- key-value pairs.
1375-
1376- 2. A benchmark of the time to repeatedly upsert values for a set of keys (so
1377- each key is updated several times); and a benchmark of the time to
1378- repeatedly update the same keys by the combination of lookup and insert
1379- (with accumulation). This uses lookups and inserts in batches. The set of
1380- keys is looked up, and the existing values are combined with the new values.
1381- The same set of key-value pairs are used 10 times, so that there are 10
1382- updates per key (either lookup and insert, or upsert).
1383-
1384- Each benchmark uses:
1385-
1386- * 64bit keys and values
1387- * values are combined using addition;
1388- * 80,000 elements (generated using a PRNG),
1389- * batches of size 250;
1390- * no disk caching;
1391- * a write buffer of 1,000 elements.
1392-
1393- These benchmarks are implemented using `criterion`, which performs multiple
1394- runs and combines the results in a sound statistical manner. The reported
1395- variance was relatively low. The benchmarks were executed on the dev laptop
1396- machine, however the absolute times of these benchmarks is of little interest.
1397- The interesting point is the relative timings.
1398-
1399- The result are as follows
1400-
1401- 1. Less than 0.4 % difference in timing between insert and upsert (932.8ms vs
1402- 929.4ms). This clearly qualifies as "similar".
1403-
1404- 2. The combination of lookup and insert takes 2.4 times as long as using upsert
1405- (2.857s vs 1.188s). We can thus reasonably conclude that the performance of
1406- upsert is "substantially better" than the combination of a lookup followed
1407- by an insert.
1364+ > operations, and substantially better than the combination of a lookup followed
1365+ > by an insert.
1366+
1367+ As already mentioned in [the discussion on functional
1368+ requirement 4](#requirement-4), the `lsm-tree` library and its documentation now
1369+ use the term ‘upsert’ for this monoidal update operation, to follow standard
1370+ database terminology.
1371+
1372+ In line with the above requirement, we have created the following benchmarks:
1373+
1374+ * A benchmark of the time to insert a large number of key–value pairs using the
1375+ insert operation and a benchmark of the time to insert the same key–value
1376+ pairs using the upsert operation
1377+
1378+ * A benchmark of the time to repeatedly upsert values of certain keys and a
1379+ benchmark of the time to repeatedly update the values of these keys by looking
1380+ up their current values, modifying them and writing them back
1381+
1382+ The benchmarks use the following parameters:
1383+
1384+ * 64 bit as the size of keys and values
1385+ * 80,000 elements (generated using a PRNG)
1386+ * Addition as the update operation
1387+ * 250 operations per batch
1388+ * No disk caching
1389+ * 1,000 elements as the write buffer capacity
1390+ * 10 updates per key in case of the second two benchmarks
1391+
1392+ The benchmarks are implemented using Criterion, which performs multiple
1393+ benchmark runs and combines the results in a sound statistical manner. For our
1394+ benchmarks, the variance of the results across the different runs, as reported
1395+ by Criterion, is relatively low. We have executed the benchmarks on the dev
1396+ laptop machine. However, the absolute running times of these benchmarks is of
1397+ little interest; the interesting point is the relative timings.
1398+
1399+ The result are as follows:
1400+
1401+ * The difference in running time between the insert and the corresponding upsert
1402+ benchmark is less than 0.4 % (932.8 ms vs. 929.4 ms), so that insert and
1403+ upsert performance clearly qualify as ‘similar’.
1404+
1405+ * Using the combination of lookup and insert takes 2.4 times as long as using
1406+ upsert (2.857 s vs. 1.188 s). We can thus reasonably conclude that the
1407+ performance of upsert is ‘substantially better’ than the performance of a
1408+ lookup followed by an insert.
14081409
14091410# References {-}
14101411
0 commit comments