Skip to content
Jingguo Yao edited this page Aug 27, 2016 · 57 revisions

All the benchmarking is done on a Thinkpad 440s laptop with a dual-core 2.10GHz Intel Core i7-4600U processor and 8 GB DDR3 memory. The laptop is running Ubuntu 14.04.

Augmentation Approaches

I have tried two approaches to augment a B-tree. One is issue-6465-opt branch. The other is rightmosts-guided-bst branch. The difference between these two approaches is that the latter uses a rightmost interval end array to guide the binary search over node.interfaces. In the following benchstat report, the first column is for rightmosts-guided-bst branch. The second column is for issue-6465-opt branch.

name                            old time/op  new time/op   delta
Insert-4                         949ns ± 0%  11337ns ± 0%  +1094.60%  (p=0.016 n=5+4)
FastInsert-4                     890ns ± 1%    966ns ± 0%     +8.51%  (p=0.016 n=5+4)
Delete-4                        1.10µs ± 1%  11.69µs ± 1%   +966.51%  (p=0.008 n=5+5)
Get-4                           3.68µs ± 1%   6.36µs ± 2%    +72.84%  (p=0.008 n=5+5)
RandomInsert-4                  3.05µs ± 2%   9.87µs ± 2%   +224.09%  (p=0.008 n=5+5)
RandomFastInsert-4              2.93µs ± 2%   3.01µs ± 2%     +2.65%  (p=0.032 n=5+5)
RandomDelete-4                  3.28µs ± 1%  13.56µs ± 2%   +313.00%  (p=0.008 n=5+5)
RandomGet-4                      945µs ± 4%   1791µs ± 1%    +89.53%  (p=0.008 n=5+5)
GetFrom1k-4                     3.78µs ± 5%   5.48µs ± 2%    +45.11%  (p=0.008 n=5+5)
GetFrom10k-4                    4.80µs ± 1%   7.18µs ± 0%    +49.67%  (p=0.008 n=5+5)
GetFrom100k-4                   3.70µs ± 1%   6.19µs ± 0%    +67.39%  (p=0.016 n=5+4)
GetFrom1000k-4                  3.96µs ± 1%   6.74µs ± 1%    +70.05%  (p=0.008 n=5+5)
InsertWithSmallTree-4           2.28µs ± 2%   3.44µs ± 1%    +50.90%  (p=0.008 n=5+5)
FastInsertWithSmallTree-4       2.27µs ± 1%   2.99µs ± 1%    +31.62%  (p=0.008 n=5+5)
InsertAndDeleteWithSmallTree-4  3.42µs ± 1%   4.56µs ± 1%    +33.48%  (p=0.008 n=5+5)
InsertAndGetWithSmallTree-4     5.59µs ± 2%   7.51µs ± 1%    +34.34%  (p=0.008 n=5+5)

The maintenance of the rightmost interval end array makes Insert and Delete methods slow. Its use also complicates the traversal of the interval tree. As a result, even the traversal is slower. If the array is updated incrementally instead of being rebuilt every time, Insert and Delete methods might run faster. But I can't figure out how to speed up the traversal. So I think that the latter is not a viable approach.

Optimization

issue-6465 branch is a basic implementation of the former approach. It has a slower Get method compared to LLRB based interval tree. Profiling shows that the function call of ToLeft slows it down. The removal of ToLeft function call results issue-6465-opt branch. Here is the result reported by benchstat. The first column is for issue-6465 branch. The second column is for issue-6465-opt branch:

name                            old time/op  new time/op  delta
Insert-4                         949ns ± 1%   946ns ± 0%     ~     (p=0.238 n=5+5)
FastInsert-4                     887ns ± 0%   889ns ± 0%     ~     (p=0.246 n=5+5)
Delete-4                        1.10µs ± 0%  1.10µs ± 0%     ~     (p=0.286 n=5+5)
Get-4                           4.76µs ± 1%  3.73µs ± 2%  -21.67%  (p=0.008 n=5+5)
RandomInsert-4                  3.04µs ± 1%  2.99µs ± 1%   -1.68%  (p=0.016 n=5+4)
RandomFastInsert-4              2.92µs ± 2%  2.94µs ± 2%     ~     (p=0.651 n=5+5)
RandomDelete-4                  3.31µs ± 4%  3.30µs ± 5%     ~     (p=1.000 n=5+5)
RandomGet-4                     1.06ms ± 1%  0.96ms ± 4%   -9.20%  (p=0.008 n=5+5)
GetFrom1k-4                     4.32µs ± 5%  3.68µs ± 2%  -14.68%  (p=0.008 n=5+5)
GetFrom10k-4                    5.58µs ± 2%  4.81µs ± 1%  -13.73%  (p=0.008 n=5+5)
GetFrom100k-4                   4.69µs ± 0%  3.69µs ± 0%  -21.34%  (p=0.016 n=4+5)
GetFrom1000k-4                  5.04µs ± 0%  3.92µs ± 0%  -22.05%  (p=0.016 n=4+5)
InsertWithSmallTree-4           2.35µs ± 8%  2.30µs ± 1%     ~     (p=0.579 n=5+5)
FastInsertWithSmallTree-4       2.29µs ± 2%  2.27µs ± 1%     ~     (p=0.286 n=4+5)
InsertAndDeleteWithSmallTree-4  3.40µs ± 1%  3.41µs ± 0%     ~     (p=0.413 n=4+5)
InsertAndGetWithSmallTree-4     6.06µs ±10%  5.65µs ± 1%   -6.75%  (p=0.008 n=5+5)

B-Tree Degree Benchmark

The following report is produced by benchmarking B-tree degrees:

name \ time/op  degree_2     degree_4     degree_8     degree_16    degree_32    degree_64    degree_128   degree_256
Insert-4        2.40µs ± 0%  1.39µs ± 0%  1.12µs ± 0%  1.01µs ± 0%  0.95µs ± 0%  0.93µs ± 0%  0.89µs ± 1%   0.90µs ± 1%
FastInsert-4    2.17µs ± 3%  1.26µs ± 0%  1.03µs ± 1%  1.06µs ± 7%  0.89µs ± 0%  0.87µs ± 0%  0.83µs ± 0%   0.83µs ± 0%
Delete-4        2.43µs ± 2%  1.56µs ± 2%  1.27µs ± 0%  1.18µs ± 0%  1.11µs ± 0%  1.07µs ± 0%  1.06µs ± 0%   1.08µs ± 1%
Get-4           4.66µs ± 3%  3.58µs ± 4%  3.39µs ± 1%  3.40µs ± 2%  3.66µs ± 1%  5.15µs ± 1%  5.87µs ± 0%  10.08µs ± 0%

The following diagram puts the above report into a chart. Average means the average of the above four methods. When degree is 32, Average is minimum. If we assume that these four methods are used uniformly, degree 32 is a reasonable choice.

degree

Comparison with LLRB based Interval Tree

For all the reports in this section, the first column is for LLRB based interval tree. The second column is B-tree based interval tree.

name                            old time/op  new time/op  delta
Insert-4                        1.41µs ± 2%  1.00µs ± 6%  -28.99%   (p=0.000 n=8+10)
FastInsert-4                     949ns ±10%   969ns ±14%     ~     (p=0.305 n=10+10)
Delete-4                        3.15µs ± 2%  1.17µs ± 8%  -62.96%   (p=0.000 n=9+10)
Get-4                           4.21µs ± 2%  3.88µs ±12%   -7.83%   (p=0.008 n=9+10)
GetFrom1k-4                     3.73µs ± 2%  3.36µs ±10%   -9.96%   (p=0.000 n=9+10)
GetFrom10k-4                    4.64µs ± 3%  4.47µs ±20%     ~     (p=0.089 n=10+10)
GetFrom100k-4                   3.97µs ± 8%  3.81µs ±10%     ~     (p=0.183 n=10+10)
GetFrom1000k-4                  4.64µs ± 5%  4.16µs ±11%  -10.46%   (p=0.001 n=9+10)
InsertWithSmallTree-4           1.77µs ± 9%  2.06µs ± 9%  +16.65%  (p=0.000 n=10+10)
FastInsertWithSmallTree-4       1.70µs ± 8%  2.04µs ±10%  +20.25%  (p=0.000 n=10+10)
InsertAndDeleteWithSmallTree-4  2.42µs ± 6%  3.28µs ±11%  +35.08%   (p=0.000 n=9+10)
InsertAndGetWithSmallTree-4     5.05µs ± 2%  5.12µs ±11%     ~      (p=0.360 n=8+10)

The conclusion is that LLRB based interval tree is faster than B-tree based interval tree for small trees. But for large trees, B-tree based interval tree is faster, especially for Delete method.

The following reports are produced with the use of random byte slices with a random length. Report for random byte slice with a length between 1 and 16:

name                old time/op  new time/op  delta
RandomInsert-4      4.10µs ± 4%  2.52µs ± 3%  -38.48%  (p=0.000 n=10+10)
RandomFastInsert-4  2.96µs ± 4%  2.42µs ± 3%  -18.11%  (p=0.000 n=10+10)
RandomDelete-4      5.88µs ± 2%  2.58µs ± 2%  -56.05%    (p=0.000 n=9+9)
RandomGet-4         2.02ms ± 7%  1.04ms ± 2%  -48.45%   (p=0.000 n=10+9)

Report for random byte slice with a length between 1 and 32:

name                old time/op  new time/op  delta
RandomInsert-4      4.17µs ± 3%  2.77µs ± 4%  -33.64%  (p=0.000 n=10+10)
RandomFastInsert-4  2.95µs ± 6%  2.63µs ± 3%  -10.61%   (p=0.000 n=10+9)
RandomDelete-4      6.13µs ± 2%  2.75µs ± 2%  -55.16%   (p=0.000 n=10+9)
RandomGet-4         1.92ms ± 4%  1.17ms ± 6%  -38.89%   (p=0.000 n=10+9)

Report for random byte slice with a length between 1 and 64:

name                old time/op  new time/op  delta
RandomInsert-4      4.45µs ± 4%  2.82µs ±10%  -36.61%  (p=0.000 n=10+10)
RandomFastInsert-4  3.13µs ± 4%  2.74µs ± 7%  -12.41%  (p=0.000 n=10+10)
RandomDelete-4      6.28µs ± 3%  2.79µs ± 4%  -55.65%  (p=0.000 n=10+10)
RandomGet-4         1.94ms ± 6%  1.41ms ± 6%  -27.32%  (p=0.000 n=10+10)

Report for random byte slice with a length between 1 and 128:

name                old time/op  new time/op  delta
RandomInsert-4      4.81µs ± 6%  2.94µs ± 5%  -38.95%  (p=0.000 n=10+10)
RandomFastInsert-4  3.34µs ± 4%  2.76µs ± 5%  -17.24%   (p=0.000 n=9+10)
RandomDelete-4      6.78µs ± 3%  2.88µs ± 1%  -57.46%   (p=0.000 n=10+8)
RandomGet-4         1.95ms ± 7%  1.36ms ± 8%  -30.20%  (p=0.000 n=10+10)

Report for random byte slice with a length between 1 and 256:

name                old time/op  new time/op  delta
RandomInsert-4      4.78µs ± 6%  3.11µs ± 8%  -34.90%   (p=0.000 n=10+9)
RandomFastInsert-4  3.38µs ± 6%  2.89µs ±10%  -14.38%  (p=0.000 n=10+10)
RandomDelete-4      6.71µs ± 3%  3.02µs ± 3%  -54.94%  (p=0.000 n=10+10)
RandomGet-4         1.89ms ± 7%  1.19ms ±12%  -37.32%  (p=0.000 n=10+10)

Report for random byte slice with a length between 1 and 512:

name                old time/op  new time/op  delta
RandomInsert-4      4.18µs ± 4%  3.11µs ± 6%  -25.60%  (p=0.000 n=10+10)
RandomFastInsert-4  3.48µs ± 9%  3.03µs ± 8%  -13.07%  (p=0.000 n=10+10)
RandomDelete-4      6.70µs ± 3%  3.22µs ± 5%  -52.01%  (p=0.000 n=10+10)
RandomGet-4         1.83ms ± 2%  0.98ms ± 7%  -46.37%   (p=0.000 n=9+10)

Report for random byte slice with a length between 1 and 1024:

name                old time/op  new time/op  delta
RandomInsert-4      4.41µs ±11%  3.18µs ±12%  -28.01%  (p=0.000 n=10+10)
RandomFastInsert-4  3.52µs ± 5%  3.02µs ± 7%  -14.24%  (p=0.000 n=10+10)
RandomDelete-4      6.86µs ± 4%  3.43µs ± 5%  -50.03%  (p=0.000 n=10+10)
RandomGet-4         1.82ms ± 3%  0.98ms ± 3%  -46.35%  (p=0.000 n=10+10)

All the reports for random byte slices also indicate that B-tree based interval tree is faster than LLRB based interval tree.

Possible Optimizations

  1. Do a 256 way or 256^2 branching based on the first byte or the first two bytes.
  2. Determine common prefix length for node range start and end. Then sort.Search only need to do byte slice comparison starting from the common prefix length.

@petermattis, please give your comments. If you think that issue-6465-opt branch is the right way to go. I will go proceed to dig into these optimizations.

Clone this wiki locally