Skip to content

Commit 191bec3

Browse files
committed
add readme
1 parent 2753a1f commit 191bec3

File tree

3 files changed

+77
-0
lines changed

3 files changed

+77
-0
lines changed

docs/assets/images/(2,4)tree.jpg

140 KB
Loading
40 KB
Loading
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# B-Trees
2+
3+
## Background
4+
Is the fastest way to search for data to store them in an array, sort them and perform binary search? No. <br>
5+
6+
We have seen binary search trees (BSTs), which always maintains data in sorted order. This allows us to avoid the
7+
overhead of sorting before we search. However, we also learnt that unbalanced BSTs can be incredibly inefficient for
8+
insertion, deletion and search operations, which are O(h) in time complexity (i.e. can go up to O(n) for unbalanced
9+
BSTs). <br>
10+
11+
Then, we learnt about self-balancing BSTs such as AVL Trees, that will help us cap the time complexity of insertion,
12+
deletion and search operations to O(h) ~= O(logn). <br>
13+
14+
B-tree is another of self-balancing search tree data structure that maintains sorted data and allows for efficient
15+
insertion, deletion and search operations.
16+
17+
### (a,b) trees
18+
19+
Before we talk about B-trees, we first introduce its family (generalized form) - (a,b) trees. <br>
20+
21+
- In an (a,b) tree, a nd b refer to the minimum and maximum number of children of an internal node in the tree. <br>
22+
- a and b are parameters where 2 <= a <= (b+1)/2.
23+
24+
Note that unlike binary trees, in (a,b) trees, each node can have more than 2 children and each node can store multiple
25+
keys.
26+
27+
Here is a (2,4) tree to aid visualisation as we go through the (a,b) tree rules/invariants.
28+
![(2,4) tree](../../../../../docs/assets/images/(2,4)tree.jpg)
29+
30+
31+
### Implementation Invariant/(a,b) Tree Rules
32+
Rule #1: (a,b)-child Policy
33+
The min and max of keys and children each node can have are bounded as follows:
34+
![(a,b) child policy](../../../../../docs/assets/images/(a,b)childpolicy.jpg)
35+
36+
Note: With the exception of leaves, realize that the number of children is always one more than the number of keys.
37+
(See rule 2)
38+
39+
The min height of an (a,b) tree will be O(logb(n)) and the max height of an (a,b) tree will be O(loga(n)). <br>
40+
41+
How do we pick the values of a and b? b is dependent on the hardware, and we want to maximise a to make the tree fatter
42+
and shorter.
43+
44+
Rule #2: Key ranges
45+
46+
A non-leaf node (i.e. root or internal) must have one more child than its number of keys. This is to ensure that all
47+
value ranges due to its keys are covered in its subtrees.
48+
49+
The permitted range of keys within a subtree is referred to be its key range.
50+
51+
Specifically, for a non-leaf node with k keys and (k+1) children:
52+
- its keys in sorted order are v1, v2, ..., vk
53+
- the subtrees due to its keys are t1, t2, ..., tk+1
54+
55+
Then:
56+
- first child t1 has key range <= v1
57+
- final child tk+1 has key range > vk
58+
- all other children ti have key range (vi-1, vi)
59+
60+
Rule #3: Leaf depth
61+
62+
All leaf nodes must be at the same depth from root.
63+
64+
## Complexity Analysis
65+
Search:
66+
67+
**Time**: O(bloga(n)) = O(logn)
68+
69+
- The max height of an (a,b) tree is O(loga(n)).
70+
- Linear search takes maximally b nodes per level.
71+
72+
**Space**: O(n)
73+
74+
where n is the number of elements (whatever the structure, it must store at least n nodes)
75+
76+
## References
77+
This description heavily references CS2040S Recitation Sheet 4.

0 commit comments

Comments
 (0)