-
Notifications
You must be signed in to change notification settings - Fork 1
algorithms
All the benchmarking is done on a Thinkpad 440s laptop with a dual-core 2.10GHz Intel Core i7-4600U processor and 8 GB DDR3 memory. The laptop is running Ubuntu 14.04.
I have tried two approaches to augment a B-tree. One is issue-6465-opt branch. The other is rightmosts-guided-bst branch. The difference between these two approaches is that the latter uses a rightmost interval end array to guide the binary search over node.interfaces. In the following benchstat report, the first column is for rightmosts-guided-bst branch. The second column is for issue-6465-opt branch.
name old time/op new time/op delta
Insert-4 949ns ± 0% 11337ns ± 0% +1094.60% (p=0.016 n=5+4)
FastInsert-4 890ns ± 1% 966ns ± 0% +8.51% (p=0.016 n=5+4)
Delete-4 1.10µs ± 1% 11.69µs ± 1% +966.51% (p=0.008 n=5+5)
Get-4 3.68µs ± 1% 6.36µs ± 2% +72.84% (p=0.008 n=5+5)
RandomInsert-4 3.05µs ± 2% 9.87µs ± 2% +224.09% (p=0.008 n=5+5)
RandomFastInsert-4 2.93µs ± 2% 3.01µs ± 2% +2.65% (p=0.032 n=5+5)
RandomDelete-4 3.28µs ± 1% 13.56µs ± 2% +313.00% (p=0.008 n=5+5)
RandomGet-4 945µs ± 4% 1791µs ± 1% +89.53% (p=0.008 n=5+5)
GetFrom1k-4 3.78µs ± 5% 5.48µs ± 2% +45.11% (p=0.008 n=5+5)
GetFrom10k-4 4.80µs ± 1% 7.18µs ± 0% +49.67% (p=0.008 n=5+5)
GetFrom100k-4 3.70µs ± 1% 6.19µs ± 0% +67.39% (p=0.016 n=5+4)
GetFrom1000k-4 3.96µs ± 1% 6.74µs ± 1% +70.05% (p=0.008 n=5+5)
InsertWithSmallTree-4 2.28µs ± 2% 3.44µs ± 1% +50.90% (p=0.008 n=5+5)
FastInsertWithSmallTree-4 2.27µs ± 1% 2.99µs ± 1% +31.62% (p=0.008 n=5+5)
InsertAndDeleteWithSmallTree-4 3.42µs ± 1% 4.56µs ± 1% +33.48% (p=0.008 n=5+5)
InsertAndGetWithSmallTree-4 5.59µs ± 2% 7.51µs ± 1% +34.34% (p=0.008 n=5+5)
The maintenance of the rightmost interval end array makes Insert and Delete methods slow. Its use also complicates the traversal of the interval tree. As a result, even the traversal is slower. If the array is updated incrementally instead of being rebuilt every time, Insert and Delete methods might run faster. But I can't figure out how to speed up the traversal. So I think that the latter is not a viable approach.
issue-6465 branch is a basic implementation of the former approach. It has a slower Get method compared to LLRB based interval tree. Profiling shows that the function call of ToLeft slows it down. The removal of ToLeft function call results issue-6465-opt branch. Here is the result reported by benchstat. The first column is for issue-6465 branch. The second column is for issue-6465-opt branch:
name old time/op new time/op delta
Insert-4 949ns ± 1% 946ns ± 0% ~ (p=0.238 n=5+5)
FastInsert-4 887ns ± 0% 889ns ± 0% ~ (p=0.246 n=5+5)
Delete-4 1.10µs ± 0% 1.10µs ± 0% ~ (p=0.286 n=5+5)
Get-4 4.76µs ± 1% 3.73µs ± 2% -21.67% (p=0.008 n=5+5)
RandomInsert-4 3.04µs ± 1% 2.99µs ± 1% -1.68% (p=0.016 n=5+4)
RandomFastInsert-4 2.92µs ± 2% 2.94µs ± 2% ~ (p=0.651 n=5+5)
RandomDelete-4 3.31µs ± 4% 3.30µs ± 5% ~ (p=1.000 n=5+5)
RandomGet-4 1.06ms ± 1% 0.96ms ± 4% -9.20% (p=0.008 n=5+5)
GetFrom1k-4 4.32µs ± 5% 3.68µs ± 2% -14.68% (p=0.008 n=5+5)
GetFrom10k-4 5.58µs ± 2% 4.81µs ± 1% -13.73% (p=0.008 n=5+5)
GetFrom100k-4 4.69µs ± 0% 3.69µs ± 0% -21.34% (p=0.016 n=4+5)
GetFrom1000k-4 5.04µs ± 0% 3.92µs ± 0% -22.05% (p=0.016 n=4+5)
InsertWithSmallTree-4 2.35µs ± 8% 2.30µs ± 1% ~ (p=0.579 n=5+5)
FastInsertWithSmallTree-4 2.29µs ± 2% 2.27µs ± 1% ~ (p=0.286 n=4+5)
InsertAndDeleteWithSmallTree-4 3.40µs ± 1% 3.41µs ± 0% ~ (p=0.413 n=4+5)
InsertAndGetWithSmallTree-4 6.06µs ±10% 5.65µs ± 1% -6.75% (p=0.008 n=5+5)
The following report is produced by benchmarking B-tree degrees:
name \ time/op degree_2 degree_4 degree_8 degree_16 degree_32 degree_64 degree_128 degree_256
Insert-4 2.40µs ± 0% 1.39µs ± 0% 1.12µs ± 0% 1.01µs ± 0% 0.95µs ± 0% 0.93µs ± 0% 0.89µs ± 1% 0.90µs ± 1%
FastInsert-4 2.17µs ± 3% 1.26µs ± 0% 1.03µs ± 1% 1.06µs ± 7% 0.89µs ± 0% 0.87µs ± 0% 0.83µs ± 0% 0.83µs ± 0%
Delete-4 2.43µs ± 2% 1.56µs ± 2% 1.27µs ± 0% 1.18µs ± 0% 1.11µs ± 0% 1.07µs ± 0% 1.06µs ± 0% 1.08µs ± 1%
Get-4 4.66µs ± 3% 3.58µs ± 4% 3.39µs ± 1% 3.40µs ± 2% 3.66µs ± 1% 5.15µs ± 1% 5.87µs ± 0% 10.08µs ± 0%
The following diagram puts the above report into a chart. Average means the average of the above four methods. When degree is 32, Average is minimum. If we assume that these four methods are used uniformly, degree 32 is a reasonable choice.

For all the reports in this section, the first column is for LLRB based interval tree. The second column is B-tree based interval tree.
name old time/op new time/op delta
Insert-4 1.41µs ± 2% 1.00µs ± 6% -28.99% (p=0.000 n=8+10)
FastInsert-4 949ns ±10% 969ns ±14% ~ (p=0.305 n=10+10)
Delete-4 3.15µs ± 2% 1.17µs ± 8% -62.96% (p=0.000 n=9+10)
Get-4 4.21µs ± 2% 3.88µs ±12% -7.83% (p=0.008 n=9+10)
GetFrom1k-4 3.73µs ± 2% 3.36µs ±10% -9.96% (p=0.000 n=9+10)
GetFrom10k-4 4.64µs ± 3% 4.47µs ±20% ~ (p=0.089 n=10+10)
GetFrom100k-4 3.97µs ± 8% 3.81µs ±10% ~ (p=0.183 n=10+10)
GetFrom1000k-4 4.64µs ± 5% 4.16µs ±11% -10.46% (p=0.001 n=9+10)
InsertWithSmallTree-4 1.77µs ± 9% 2.06µs ± 9% +16.65% (p=0.000 n=10+10)
FastInsertWithSmallTree-4 1.70µs ± 8% 2.04µs ±10% +20.25% (p=0.000 n=10+10)
InsertAndDeleteWithSmallTree-4 2.42µs ± 6% 3.28µs ±11% +35.08% (p=0.000 n=9+10)
InsertAndGetWithSmallTree-4 5.05µs ± 2% 5.12µs ±11% ~ (p=0.360 n=8+10)
The conclusion is that LLRB based interval tree is faster than B-tree based interval tree for small trees. But for large trees, B-tree based interval tree is faster, especially for Delete method.
The following reports are produced with the use of random byte slices with a random length. Report for random byte slice with a length between 1 and 16:
name old time/op new time/op delta
RandomInsert-4 4.10µs ± 4% 2.52µs ± 3% -38.48% (p=0.000 n=10+10)
RandomFastInsert-4 2.96µs ± 4% 2.42µs ± 3% -18.11% (p=0.000 n=10+10)
RandomDelete-4 5.88µs ± 2% 2.58µs ± 2% -56.05% (p=0.000 n=9+9)
RandomGet-4 2.02ms ± 7% 1.04ms ± 2% -48.45% (p=0.000 n=10+9)
Report for random byte slice with a length between 1 and 32:
name old time/op new time/op delta
RandomInsert-4 4.17µs ± 3% 2.77µs ± 4% -33.64% (p=0.000 n=10+10)
RandomFastInsert-4 2.95µs ± 6% 2.63µs ± 3% -10.61% (p=0.000 n=10+9)
RandomDelete-4 6.13µs ± 2% 2.75µs ± 2% -55.16% (p=0.000 n=10+9)
RandomGet-4 1.92ms ± 4% 1.17ms ± 6% -38.89% (p=0.000 n=10+9)
Report for random byte slice with a length between 1 and 64:
name old time/op new time/op delta
RandomInsert-4 4.45µs ± 4% 2.82µs ±10% -36.61% (p=0.000 n=10+10)
RandomFastInsert-4 3.13µs ± 4% 2.74µs ± 7% -12.41% (p=0.000 n=10+10)
RandomDelete-4 6.28µs ± 3% 2.79µs ± 4% -55.65% (p=0.000 n=10+10)
RandomGet-4 1.94ms ± 6% 1.41ms ± 6% -27.32% (p=0.000 n=10+10)
Report for random byte slice with a length between 1 and 128:
name old time/op new time/op delta
RandomInsert-4 4.81µs ± 6% 2.94µs ± 5% -38.95% (p=0.000 n=10+10)
RandomFastInsert-4 3.34µs ± 4% 2.76µs ± 5% -17.24% (p=0.000 n=9+10)
RandomDelete-4 6.78µs ± 3% 2.88µs ± 1% -57.46% (p=0.000 n=10+8)
RandomGet-4 1.95ms ± 7% 1.36ms ± 8% -30.20% (p=0.000 n=10+10)
Report for random byte slice with a length between 1 and 256:
name old time/op new time/op delta
RandomInsert-4 4.78µs ± 6% 3.11µs ± 8% -34.90% (p=0.000 n=10+9)
RandomFastInsert-4 3.38µs ± 6% 2.89µs ±10% -14.38% (p=0.000 n=10+10)
RandomDelete-4 6.71µs ± 3% 3.02µs ± 3% -54.94% (p=0.000 n=10+10)
RandomGet-4 1.89ms ± 7% 1.19ms ±12% -37.32% (p=0.000 n=10+10)
Report for random byte slice with a length between 1 and 512:
name old time/op new time/op delta
RandomInsert-4 4.18µs ± 4% 3.11µs ± 6% -25.60% (p=0.000 n=10+10)
RandomFastInsert-4 3.48µs ± 9% 3.03µs ± 8% -13.07% (p=0.000 n=10+10)
RandomDelete-4 6.70µs ± 3% 3.22µs ± 5% -52.01% (p=0.000 n=10+10)
RandomGet-4 1.83ms ± 2% 0.98ms ± 7% -46.37% (p=0.000 n=9+10)
Report for random byte slice with a length between 1 and 1024:
name old time/op new time/op delta
RandomInsert-4 4.41µs ±11% 3.18µs ±12% -28.01% (p=0.000 n=10+10)
RandomFastInsert-4 3.52µs ± 5% 3.02µs ± 7% -14.24% (p=0.000 n=10+10)
RandomDelete-4 6.86µs ± 4% 3.43µs ± 5% -50.03% (p=0.000 n=10+10)
RandomGet-4 1.82ms ± 3% 0.98ms ± 3% -46.35% (p=0.000 n=10+10)
All the reports for random byte slices also indicate that B-tree based interval tree is faster than LLRB based interval tree.
- Do a
256way or256^2branching based on the first byte or the first two bytes. - Determine
common prefix lengthfor node range start and end. Thensort.Searchonly need to do byte slice comparison starting from thecommon prefix length.
@petermattis, please give your comments. If you think that issue-6465-opt branch is the right way to go. I will go proceed to dig into these optimizations.